Document Type

Article

Publication Date

2024

DOI

10.1038/s41467-024-49912-8

Publication Title

Nature Communications

Volume

15

Issue

1

Pages

5573 (1-14)

Abstract

Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.

Rights

© 2024 The Authors.

This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0 License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Data Availability

Article states: "The reference genomes for nine species, including Oryza sativa (assembly IRGSP-1.0), Caenorhabditis briggsae (assembly CB4), Drosophila melanogaster (assembly Release 6 plus ISO1 MT), Danio rerio (assembly GRCz11), Zea mays (assembly Zm-B73-REFERENCE-NAM-5.0), Arabidopsis thaliana (assembly TAIR10.1), Gallus gallus (assembly GCF_000002315.5), Taeniopygia guttata (assembly GCF_000151805.1), and Mus musculus (assembly GCA_000001635.2), can be accessed through NCBI GenBank [https://www.ncbi.nlm.nih.gov/genome/]. The other rice genome (Oryza sativa L. ssp. japonica cv. “Nipponbare” v. MSU7) used in the Ghd2 gene experiment of this study, as well as its annotation with respect to both genes and repeats, can be accessed through the Rice Genome Annotation Project [http://rice.uga.edu/]. The telomere-to-telomere assembly of the maize, rice, and Arabidopsis genomes used in this study can be found in CyVerse [https://data.cyverse.org/dav-anon/iplant/home/laijs/Zm-Mo17-REFERENCE-CAU-2.0/], RiceSuperPIRdb [http://www.ricesuperpir.com/web/download], and GitHub [https://github.com/schatzlab/Col-CEN/tree/main/v1.2]. The curated TE libraries used in this study can be accessed through a paid subscription to Repbase [https://www.girinst.org/repbase/]. Additionally, the TE libraries and novel transposons generated in this study are publicly available in the GitHub repository CSU-KangHu/TE_annotation [https://github.com/CSU-KangHu/TE_annotation] and Zenodo72."

Original Publication Citation

Hu, K., Ni, P., Xu, M., Zou, Y., Chang, J., Gao, X., Li, Y., Ruan, J., Hu, B., & Wang, J. (2024). HiTE: A fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation. Nature Communications, 15(1), 1-14, Article 5573. https://doi.org/10.1038/s41467-024-49912-8

HiTESupplInfo.pdf (3378 kB)
Supplementary Information

HiTEPeerReview.pdf (1655 kB)
Peer Review File

HiTESupplementaryFileDesc.pdf (83 kB)
Description of Additional Supplementary Files

41467_2024_49912_MOESM4_ESM (1).xlsx (72 kB)
Supplementary Data1

41467_2024_49912_MOESM5_ESM.xlsx (15 kB)
Supplementary Data2

HiTEReportingData.pdf (1952 kB)
Reporting Summary

Share

COinS