Date of Award

Summer 2012

Document Type


Degree Name

Doctor of Philosophy (PhD)


Mathematics & Statistics


Computational and Applied Mathematics

Committee Director

Nak-Kyeong Kim

Committee Member

N. Rao Chaganty

Committee Member

Dayanand N. Naik

Committee Member

Jing He


Protein-DNA interaction is vital to many biological processes in cells such as cell division, embryo development and regulating gene expression. Chromatin Immunoprecipitation followed by massively parallel sequencing (ChIP-seq) is a new technology that can reveal protein binding sites in genome with superior accuracy. Although many methods have been proposed to find binding sites for ChIP-seq data, they can find only one binding site within a short region of the genome. In this study we introduce a statistical model to identify multiple binding sites of a transcription factor within a short region of the genome using the ChIP-seq data. Mapped sequence reads from the ChIP-seq experiments are modeled as the sum of observations from unknown number of Poisson distributions. The rate parameters of these Poisson distributions are considered as a function of the underlying distribution of the tags that depends on the locations of the binding sites and their intensity parameters. For the parameter estimation of the model, two major approaches are discussed: one is a Bayesian method, the other, the EM algorithm. For the Bayesian method the reversible jump Markov chain Monte Carlo (RJMCMC) method is used for computation. An extensive simulation study was performed for the selection of proposal methods and priors in RJMCMC as well as for the comparison of model selection criteria in the EM algorithm. Real ChIP-seq datasets for transcription factors STAT1 and ZNF143 were used to demonstrate the performance of the proposed model. The results from the multiple binding sites model were compared with existing peak-calling programs.


In Copyright. URI: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).