Document Type

Conference Paper

Publication Date

2023

DOI

10.1145/3584371.3612947

Publication Title

Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages

102 (8 pp.)

Conference Name

The 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, September 3-6, 2023, Houston, Texas

Abstract

More and more deep learning approaches have been proposed to segment secondary structures from cryo-electron density maps at medium resolution range (5--10Å). Although the deep learning approaches show great potential, only a few small experimental data sets have been used to test the approaches. There is limited understanding about potential factors, in data, that affect the performance of segmentation. We propose an approach to generate data sets with desired specifications in three potential factors - the protein sequence identity, structural contents, and data quality. The approach was implemented and has generated a test set and various training sets to study the effect of secondary structure content and data quality on the performance of DeepSSETracer, a deep learning method that segments regions of protein secondary structures from cryo-EM map components. Results show that various content levels in the secondary structure and data quality influence the performance of segmentation for DeepSSETracer.

Rights

© 2023 held by the owner/authors.

This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Original Publication Citation

Nguyen, T., Mu, Y., Sun, J., & He, J. (2023). An approach to developing benchmark datasets for protein secondary structure segmentation from Cryo-EM density maps. In Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (Article 102). Association for Computing Machinery. https://doi.org/10.1145/3584371.3612947

ORCID

0009-0000-8905-7553 (Sun)

Share

COinS