Date of Award

Spring 2015

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Mathematics & Statistics

Program/Concentration

Computational and Applied Mathematics

Committee Director

Norou Diawara

Committee Director

Nak-Kyeong Kim

Committee Member

N. Rao Chaganty

Committee Member

Michael Doviak

Abstract

It is essential to determine the protein-DNA binding sites to understand many biological processes. A transcription factor is a particular type of protein that binds to DNA and controls gene regulation in living organisms. Chromatin immunoprecipitation followed by highthroughput sequencing (ChIP-seq) is considered the gold standard in locating these binding sites and programs use to identify DNA-transcription factor binding sites are known as peak-callers. ChIP-seq data are known to exhibit considerable background noise and other biases. In this study, we propose a negative binomial model (NB), a zero-inflated Poisson model (ZIP) and a zero-inflated negative binomial model (ZINB) for peak-calling. Using real ChIP-seq datasets, we show that ZINB model is the best model for ChIP-seq data. Then we incorporate control data, GC count information, and mappability information into the ZINB regression model as covariates using two link functions. We implemented this approach in C++, and our peak-caller chooses the optimal parameter combination for a given dataset. Performace of our approach is compared with two frequently used peak-callers: QuEST and MACS.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/1s51-cs87

ISBN

9781321843439

Share

COinS