Date of Award

Spring 2015

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Mathematics and Statistics

Program/Concentration

Computational and Applied Mathematics

Committee Director

Norou Diawara

Committee Director

Nak-Kyeong Kim

Committee Member

N. Rao Chaganty

Committee Member

Michael Doviak

Abstract

It is essential to determine the protein-DNA binding sites to understand many biological processes. A transcription factor is a particular type of protein that binds to DNA and controls gene regulation in living organisms. Chromatin immunoprecipitation followed by highthroughput sequencing (ChIP-seq) is considered the gold standard in locating these binding sites and programs use to identify DNA-transcription factor binding sites are known as peak-callers. ChIP-seq data are known to exhibit considerable background noise and other biases. In this study, we propose a negative binomial model (NB), a zero-inflated Poisson model (ZIP) and a zero-inflated negative binomial model (ZINB) for peak-calling. Using real ChIP-seq datasets, we show that ZINB model is the best model for ChIP-seq data. Then we incorporate control data, GC count information, and mappability information into the ZINB regression model as covariates using two link functions. We implemented this approach in C++, and our peak-caller chooses the optimal parameter combination for a given dataset. Performace of our approach is compared with two frequently used peak-callers: QuEST and MACS.

DOI

10.25777/1s51-cs87

ISBN

9781321843439

Share

COinS