Document Type

Article

Publication Date

12-2022

DOI

10.1186/s12859-022-05081-3

Publication Title

BMC Bioinformatics

Volume

23

Pages

545 (16 pp.)

Abstract

Background: Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function.

Results: We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTA GGG ) tracts).

Conclusion: In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10–20 kb reads are unavailable.

Rights

© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Data Availability

The NPGREAT software and the datasets generated/analyzed during the current study are available in the repository: https://github.com/eleniadam/npgreat. The ultralong Nanopore reads (rel8) and the 10 × Linked-Reads for the CHM13 cell line are available by the “Telomere-to-Telomere” (T2T) Consortium at https://github.com/marbl/CHM13.

Original Publication Citation

Adam, E., Ranjan, D., & Riethman, H. (2022). NPGREAT: Assembly of the human subtelomere regions with the use of ultralong nanopore reads and linked reads. BMC Bioinformatics 23, Article 545. https://doi.org/10.1186/s12859-022-05081-3

ORCID

0000-0002-7548-4375 (Adam)

Share

COinS