Document Type
Article
Publication Date
2024
DOI
10.1038/s41467-024-51891-9
Publication Title
Nature Communications
Volume
15
Issue
1
Pages
7561 (1-20)
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for characterizing cellular landscapes within complex tissues. Large-scale single-cell transcriptomics holds great potential for identifying rare cell types critical to the pathogenesis of diseases and biological processes. Existing methods for identifying rare cell types often rely on one-time clustering using partial or global gene expression. However, these rare cell types may be overlooked during the clustering phase, posing challenges for their accurate identification. In this paper, we propose a Cluster decomposition-based Anomaly Detection method (scCAD), which iteratively decomposes clusters based on the most differential signals in each cluster to effectively separate rare cell types and achieve accurate identification. We benchmark scCAD on 25 real-world scRNA-seq datasets, demonstrating its superior performance compared to 10 state-of-the-art methods. In-depth case studies across diverse datasets, including mouse airway, brain, intestine, human pancreas, immunology data, and clear cell renal cell carcinoma, showcase scCAD’s efficiency in identifying rare cell types in complex biological scenarios. Furthermore, scCAD can correct the annotation of rare cell types and identify immune cell subtypes associated with disease, thereby offering valuable insights into disease progression.
Rights
© 2024 The Authors.
This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if you modified the licensed material. You do not have permission under this license to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Data Availability
Article states: "The details of the datasets used in this study are reported in Supplementary Table 1. All described datasets are obtained from various public websites under accession codes provided in Supplementary Table 1, including NCBI Gene Expression Omnibus (GEO) [https://www.ncbi.nlm.nih.gov/geo/], ArrayExpress [https://www.ebi.ac.uk/arrayexpress/], Sequence Read Archive (SRA) [https://www.ncbi.nlm.nih.gov/sra]. 10X PBMC is obtained at Github [https://github.com/ttgump/scDeepCluster/blob/master/scRNA-seq%20data/10X_PBMC.h5]. 68k PBMC and Jurkat datasets are obtained from the website of 10X genomics ([https://www.10xgenomics.com/datasets/fresh-68-k-pbm-cs-donor-a-1-standard-1-1-0], [https://www.10xgenomics.com/datasets/50-percent-50-percent-jurkat-293-t-cell-mixture-1-standard-1-1-0]). The worm neuron cells dataset Cao is sampled from a dataset obtained from the sci-RNA-seq platform (single-cell combinatorial indexing RNA sequencing) [http://atlas.gs.washington.edu/worm-rna/docs/]. The preprocessed human tonsil data, named Tonsil, and Crohn data are available from Broad Institute Single Cell Portal ([https://singlecell.broadinstitute.org/single_cell/study/SCP2169/slide-tags-snrna-seq-on-human-tonsil], [https://singlecell.broadinstitute.org/single_cell/study/SCP359/ica-ileum-lamina-propria-immunocytes-sinai]). The mouse retina data and B_ lymphoma data are available at Github [https://github.com/OSU-BMBL/marsgt/tree/main/Data]. Source data are provided with this paper."
Original Publication Citation
Xu, Y., Wang, S., Feng, Q., Xia, J., Li, Y., Li, H.-D., & Wang, J. (2024). ScCAD: Cluster decomposition-based Anomaly Detection for rare cell identification in single-cell expression data. Nature Communications, 15(1), 1-20, Article 7561. https://doi.org/10.1038/s41467-024-51891-9
Repository Citation
Xu, Y., Wang, S., Feng, Q., Xia, J., Li, Y., Li, H.-D., & Wang, J. (2024). ScCAD: Cluster decomposition-based Anomaly Detection for rare cell identification in single-cell expression data. Nature Communications, 15(1), 1-20, Article 7561. https://doi.org/10.1038/s41467-024-51891-9
ORCID
0000-0003-0178-1876 (Li)
Supplementary information
Li-PeerReview.pdf (1196 kB)
Peer Review File
Li-Description-SuppFiles.pdf (95 kB)
Description of Additional Supplementary Files
Li-SuppData.xlsx (129 kB)
Supplementary Dataset 1-11
Li-ReportingSummary.pdf (332 kB)
Reporting Summary
Included in
Cancer Biology Commons, Cells Commons, Computational Biology Commons, Digestive System Commons, Theory and Algorithms Commons