Mathematics & Statistics Theses & Dissertations

A Statistical Model to Determine Multiple Binding Sites of a Transcription Factor on DNA Using ChIP-seq Data

Rasika Jayatillake, Old Dominion University

Date of Award

Summer 2012

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Mathematics & Statistics

Program/Concentration

Computational and Applied Mathematics

Committee Director

Nak-Kyeong Kim

Committee Member

N. Rao Chaganty

Committee Member

Dayanand N. Naik

Committee Member

Jing He

Abstract

Protein-DNA interaction is vital to many biological processes in cells such as cell division, embryo development and regulating gene expression. Chromatin Immunoprecipitation followed by massively parallel sequencing (ChIP-seq) is a new technology that can reveal protein binding sites in genome with superior accuracy. Although many methods have been proposed to find binding sites for ChIP-seq data, they can find only one binding site within a short region of the genome. In this study we introduce a statistical model to identify multiple binding sites of a transcription factor within a short region of the genome using the ChIP-seq data. Mapped sequence reads from the ChIP-seq experiments are modeled as the sum of observations from unknown number of Poisson distributions. The rate parameters of these Poisson distributions are considered as a function of the underlying distribution of the tags that depends on the locations of the binding sites and their intensity parameters. For the parameter estimation of the model, two major approaches are discussed: one is a Bayesian method, the other, the EM algorithm. For the Bayesian method the reversible jump Markov chain Monte Carlo (RJMCMC) method is used for computation. An extensive simulation study was performed for the selection of proposal methods and priors in RJMCMC as well as for the comparison of model selection criteria in the EM algorithm. Real ChIP-seq datasets for transcription factors STAT1 and ZNF143 were used to demonstrate the performance of the proposed model. The results from the multiple binding sites model were compared with existing peak-calling programs.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/jx2b-6k93

ISBN

9781267668325

Recommended Citation

Jayatillake, Rasika. "A Statistical Model to Determine Multiple Binding Sites of a Transcription Factor on DNA Using ChIP-seq Data" (2012). Doctor of Philosophy (PhD), Dissertation, Mathematics & Statistics, Old Dominion University, DOI: 10.25777/jx2b-6k93
https://digitalcommons.odu.edu/mathstat_etds/28

Download

Included in

Applied Statistics Commons, Bioinformatics Commons

COinS

ODU Digital Commons

Mathematics & Statistics Theses & Dissertations

A Statistical Model to Determine Multiple Binding Sites of a Transcription Factor on DNA Using ChIP-seq Data

Date of Award

Document Type

Degree Name

Department

Program/Concentration

Committee Director

Committee Member

Committee Member

Committee Member

Abstract

Rights

DOI

ISBN

Recommended Citation

Included in

Search

Browse

Contribute

Links

Contact Us

ODU Digital Commons

Mathematics & Statistics Theses & Dissertations

A Statistical Model to Determine Multiple Binding Sites of a Transcription Factor on DNA Using ChIP-seq Data

Author

Date of Award

Document Type

Degree Name

Department

Program/Concentration

Committee Director

Committee Member

Committee Member

Committee Member

Abstract

Rights

DOI

ISBN

Recommended Citation

Included in

Share

Search

Browse

Contribute

Links

Contact Us