Document Type
Conference Paper
Publication Date
2023
DOI
10.1145/3584371.3612947
Publication Title
Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
Pages
102 (8 pp.)
Conference Name
The 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, September 3-6, 2023, Houston, Texas
Abstract
More and more deep learning approaches have been proposed to segment secondary structures from cryo-electron density maps at medium resolution range (5--10Å). Although the deep learning approaches show great potential, only a few small experimental data sets have been used to test the approaches. There is limited understanding about potential factors, in data, that affect the performance of segmentation. We propose an approach to generate data sets with desired specifications in three potential factors - the protein sequence identity, structural contents, and data quality. The approach was implemented and has generated a test set and various training sets to study the effect of secondary structure content and data quality on the performance of DeepSSETracer, a deep learning method that segments regions of protein secondary structures from cryo-EM map components. Results show that various content levels in the secondary structure and data quality influence the performance of segmentation for DeepSSETracer.
Rights
© 2023 held by the owner/authors.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Original Publication Citation
Nguyen, T., Mu, Y., Sun, J., & He, J. (2023). An approach to developing benchmark datasets for protein secondary structure segmentation from Cryo-EM density maps. In Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (Article 102). Association for Computing Machinery. https://doi.org/10.1145/3584371.3612947
Repository Citation
Nguyen, T., Mu, Y., Sun, J., & He, J. (2023). An approach to developing benchmark datasets for protein secondary structure segmentation from Cryo-EM density maps. In Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (Article 102). Association for Computing Machinery. https://doi.org/10.1145/3584371.3612947
ORCID
0009-0000-8905-7553 (Sun)
Included in
Amino Acids, Peptides, and Proteins Commons, Artificial Intelligence and Robotics Commons, Computational Biology Commons