Date of Award

Fall 12-2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Mathematics & Statistics

Program/Concentration

Computational and Applied Mathematics

Committee Director

N. Rao Chaganty

Committee Member

Lucia Tabacu

Committee Member

Sinjini Sikdar

Committee Member

Hadiza Galadima

Abstract

Deoxyribonucleic acid, more commonly known as DNA, is a complex double helix-shaped molecule present in all living organisms and hosts thousands of genes. However, only a few genes exhibit differential expression and play a vital role in a particular disease such as breast cancer. Microarray technology is one of the modern technologies developed to study these gene expressions. There are two major microarray technologies available for expression analysis: Spotted cDNA array and oligonucleotide array. The focus of our research is the statistical analysis of data that arises from the spotted cDNA microarray. Numerous models have been proposed in the literature to identify differentially expressed genes from the red and green intensities measured by the cDNA microarrays. Motivated by the Bayesian models described in Newton et al. (2001) and Mav and Chaganty (2004), we propose two models for the joint distribution of the red and green intensities using a Gaussian copula, which accounts for the dependence. In both models, we assume the marginals are distributed as gamma. The differentially expressed genes were identified by calculating the Bayes estimates of the differential expression under the first proposed copula model. The second copula model incorporates a latent Bernoulli variable, which indicates differential expression. The EM algorithm is applied to calculate the posterior probabilities of differential expression for the second model. The posterior probabilities rank the genes. We conducted two simulation studies to check the parameter estimation for the Gaussian copula-based models. We show that our models improve the models given in Newton et al. (2001) and Mav and Chaganty (2004). We have also studied the use of Weibull distribution instead of gamma distribution for the marginals. Our analysis shows that the copula models with Weibull marginals provide a better fit and improve the identification of genes. Finally, we illustrate the application of our models on samples of Escherichia coli microarrays data.

DOI

10.25777/tjnp-th94

ISBN

9798762197205

ORCID

0000-0002-0131-842X

Share

COinS