College of Sciences


Ph.D. Computer Science

Publication Date

Spring 2020




The telomeres are vitally important regions that are located at the tips of the chromosomes. Their dysfunction, caused by length shortening can lead to senescent cells, which in turn cause age-related diseases, including cancer. The subtelomeres, located next to the telomeres, possess the critical role of regulating the adjacent telomere lengths. Even after many years of research, human subtelomeres have proven to be very hard to assemble due to their morphology. In order to overcome these problems, the hybrid assembly method we develop utilizes two of the latest available types of data, which complement each other: Linked-Reads and ultralong Nanopore reads. Our strategy has been to use initially the adjacent single-copy region of a telomere to search for the linked-read and nanopore read datasets that correspond to the subtelomere region in question. As a next step, we use the REXTAL (Regional Extension of Assemblies Using Linked-Reads) method to create the set of short-read assemblies derived from the selected linked-reads. We develop the NanoPore Guided Regional Assembly Tool (NPGREAT), which assembles the short-read REXTAL assemblies and the selected ultralong reads. In the NPGREAT, the ultralong Nanopore reads are used as scaffolds upon which the REXTAL contigs can be placed and corrected, replacing the low-quality Nanopore sequence with high-quality REXTAL sequence for matching regions. In the regions that lack REXTAL coverage, we retain the Nanopore sequence, stated as “connectors”, useful for spacing, orienting and ordering multiple REXTAL contigs. Its output is a single sequence. We tested NPGREAT on the NA12878 human subtelomeric regions. The output assemblies are of high percent identity with the hg38 reference, with differences only in the variable tandem-repeat regions of the sequence. The hybrid NPGREAT method provides for the first time the high quality continuous assembly of human subtelomeric regions.


Telomeres, REXTAL, Linked-reads, Nanopore reads, NPGREAT, Human subtelomeric regions


Cell Biology | Computer Sciences | Genetics and Genomics | Molecular Biology



Download Full Text (384 KB)

NPGREAT: Hybrid Assembly of Human Subtelomeres with the Use of Nanopore and Linked-Read Datasets