Document Type
Article
Publication Date
2024
DOI
10.1007/s13253-024-00665-3
Publication Title
Journal of Agricultural, Biological and Environmental Statistics
Volume
Article in Press
Pages
20 pp.
Abstract
RNA-sequencing (RNA-seq) technology allows for the identification of differentially expressed genes, which are genes whose mean transcript abundance levels vary across conditions. In practice, RNA-seq datasets often include covariates that are of primary interest in addition to a set of covariates that are subject to selection. Some of these covariates may be relevant to gene expression levels, while others may be irrelevant. Ignoring relevant covariates or attempting to adjust for the effect of irrelevant covariates can compromise the identification of differentially expressed genes. To address this issue, we propose a variable selection method that uses pseudo-variables to control the expected proportion of selected covariates that are irrelevant. Our method accurately selects relevant covariates while keeping the false selection rate below a specified level. We demonstrate that our method outperforms existing methods for detecting differentially expressed genes when working with available covariates. Our method is implemented in FSRAnalysisBS function of the R package csrnaseq, which is available at www.github.com/ntyet/csrnaseq. The analysis and simulation are available at www.github.com/ntyet/csrnaseq/tree/main/analysis.
Rights
© 2024 The Authors.
This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original authors and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Original Publication Citation
Nguyen, Y., & Nettleton, D. (2024). Identifying relevant covariates in RNA-seq analysis by pseudo-variable augmentation. Journal of Agricultural, Biological and Environmental Statistics. Advance online publication. https://doi.org/10.1007/s13253-024-00665-3
ORCID
0000-0003-4881-3476 (Nguyen)
Repository Citation
Nguyen, Yet and Nettleton, Dan, "Identifying Relevant Covariates in RNA-seq Analysis by Pseudo-Variable Augmentation" (2024). Mathematics & Statistics Faculty Publications. 277.
https://digitalcommons.odu.edu/mathstat_fac_pubs/277
Supplementary Materials
Included in
Data Science Commons, Genetics and Genomics Commons, Investigative Techniques Commons, Mathematics Commons