33 - DNA Methylation Prediction

Description/Abstract/Artist Statement

Deep learning has become an essential tool for deciphering genomic sequences and predicting regulatory activities in cellular biology. However, training a deep neural network to achieve a model with optimal performance for prediction tasks in functional genomics remains a challenge due to the high complexity in the regulatory biology of DNA sequence. In this study, we explore the impact of multiple key network training parameters, including the optimization algorithm, training objective, input context length, and network architecture on model performance for DNA methylation prediction from DNA sequence. Our results show that deeper architectures improve predictive accuracy but require careful selection of optimization algorithms to prevent overfitting. We also find that increasing input context length enhances model performance up to a certain threshold, beyond which diminishing returns occur. Additionally, loss function choice significantly influences model stability and generalization, with certain formulations better capturing sequence-level dependencies. By systematically evaluating these factors across multiple model architectures, we identify approaches that enhance predictive accuracy and consistency. Our findings provide insights into the trade-offs between model depth, optimization techniques, loss function selection, and sequence length offering a framework for improving deep learning applications in genomics. This work contributes to the development of more effective computational tools for analyzing regulatory sequences and understanding their role in gene regulation and disease.

Presenting Author Name/s

Bashar Fakhreddin

Faculty Advisor/Mentor

Jiangwen Sun

Faculty Advisor/Mentor Department

Computer Science

College Affiliation

College of Sciences

Presentation Type

Poster

This document is currently not available here.

Share

COinS
 

33 - DNA Methylation Prediction

Deep learning has become an essential tool for deciphering genomic sequences and predicting regulatory activities in cellular biology. However, training a deep neural network to achieve a model with optimal performance for prediction tasks in functional genomics remains a challenge due to the high complexity in the regulatory biology of DNA sequence. In this study, we explore the impact of multiple key network training parameters, including the optimization algorithm, training objective, input context length, and network architecture on model performance for DNA methylation prediction from DNA sequence. Our results show that deeper architectures improve predictive accuracy but require careful selection of optimization algorithms to prevent overfitting. We also find that increasing input context length enhances model performance up to a certain threshold, beyond which diminishing returns occur. Additionally, loss function choice significantly influences model stability and generalization, with certain formulations better capturing sequence-level dependencies. By systematically evaluating these factors across multiple model architectures, we identify approaches that enhance predictive accuracy and consistency. Our findings provide insights into the trade-offs between model depth, optimization techniques, loss function selection, and sequence length offering a framework for improving deep learning applications in genomics. This work contributes to the development of more effective computational tools for analyzing regulatory sequences and understanding their role in gene regulation and disease.