Document Type

Article

Publication Date

2024

DOI

10.1007/s13253-024-00665-3

Publication Title

Journal of Agricultural, Biological and Environmental Statistics

Volume

Article in Press

Pages

20 pp.

Abstract

RNA-sequencing (RNA-seq) technology allows for the identification of differentially expressed genes, which are genes whose mean transcript abundance levels vary across conditions. In practice, RNA-seq datasets often include covariates that are of primary interest in addition to a set of covariates that are subject to selection. Some of these covariates may be relevant to gene expression levels, while others may be irrelevant. Ignoring relevant covariates or attempting to adjust for the effect of irrelevant covariates can compromise the identification of differentially expressed genes. To address this issue, we propose a variable selection method that uses pseudo-variables to control the expected proportion of selected covariates that are irrelevant. Our method accurately selects relevant covariates while keeping the false selection rate below a specified level. We demonstrate that our method outperforms existing methods for detecting differentially expressed genes when working with available covariates. Our method is implemented in FSRAnalysisBS function of the R package csrnaseq, which is available at www.github.com/ntyet/csrnaseq. The analysis and simulation are available at www.github.com/ntyet/csrnaseq/tree/main/analysis.

Rights

© 2024 The Authors.

This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original authors and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Original Publication Citation

Nguyen, Y., & Nettleton, D. (2024). Identifying relevant covariates in RNA-seq analysis by pseudo-variable augmentation. Journal of Agricultural, Biological and Environmental Statistics. Advance online publication. https://doi.org/10.1007/s13253-024-00665-3

ORCID

0000-0003-4881-3476 (Nguyen)

13253_2024_665_MOESM1_ESM.pdf (505 kB)
Supplementary Materials

Share

COinS