Document Type

Article

Publication Date

2024

DOI

10.1038/s41467-024-51891-9

Publication Title

Nature Communications

Volume

15

Issue

1

Pages

7561 (1-20)

Abstract

Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for characterizing cellular landscapes within complex tissues. Large-scale single-cell transcriptomics holds great potential for identifying rare cell types critical to the pathogenesis of diseases and biological processes. Existing methods for identifying rare cell types often rely on one-time clustering using partial or global gene expression. However, these rare cell types may be overlooked during the clustering phase, posing challenges for their accurate identification. In this paper, we propose a Cluster decomposition-based Anomaly Detection method (scCAD), which iteratively decomposes clusters based on the most differential signals in each cluster to effectively separate rare cell types and achieve accurate identification. We benchmark scCAD on 25 real-world scRNA-seq datasets, demonstrating its superior performance compared to 10 state-of-the-art methods. In-depth case studies across diverse datasets, including mouse airway, brain, intestine, human pancreas, immunology data, and clear cell renal cell carcinoma, showcase scCAD’s efficiency in identifying rare cell types in complex biological scenarios. Furthermore, scCAD can correct the annotation of rare cell types and identify immune cell subtypes associated with disease, thereby offering valuable insights into disease progression.

Rights

© 2024 The Authors.

This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if you modified the licensed material. You do not have permission under this license to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Data Availability

Article states: "The details of the datasets used in this study are reported in Supplementary Table 1. All described datasets are obtained from various public websites under accession codes provided in Supplementary Table 1, including NCBI Gene Expression Omnibus (GEO) [https://www.ncbi.nlm.nih.gov/geo/], ArrayExpress [https://www.ebi.ac.uk/arrayexpress/], Sequence Read Archive (SRA) [https://www.ncbi.nlm.nih.gov/sra]. 10X PBMC is obtained at Github [https://github.com/ttgump/scDeepCluster/blob/master/scRNA-seq%20data/10X_PBMC.h5]. 68k PBMC and Jurkat datasets are obtained from the website of 10X genomics ([https://www.10xgenomics.com/datasets/fresh-68-k-pbm-cs-donor-a-1-standard-1-1-0], [https://www.10xgenomics.com/datasets/50-percent-50-percent-jurkat-293-t-cell-mixture-1-standard-1-1-0]). The worm neuron cells dataset Cao is sampled from a dataset obtained from the sci-RNA-seq platform (single-cell combinatorial indexing RNA sequencing) [http://atlas.gs.washington.edu/worm-rna/docs/]. The preprocessed human tonsil data, named Tonsil, and Crohn data are available from Broad Institute Single Cell Portal ([https://singlecell.broadinstitute.org/single_cell/study/SCP2169/slide-tags-snrna-seq-on-human-tonsil], [https://singlecell.broadinstitute.org/single_cell/study/SCP359/ica-ileum-lamina-propria-immunocytes-sinai]). The mouse retina data and B_ lymphoma data are available at Github [https://github.com/OSU-BMBL/marsgt/tree/main/Data]. Source data are provided with this paper."

Original Publication Citation

Xu, Y., Wang, S., Feng, Q., Xia, J., Li, Y., Li, H.-D., & Wang, J. (2024). ScCAD: Cluster decomposition-based Anomaly Detection for rare cell identification in single-cell expression data. Nature Communications, 15(1), 1-20, Article 7561. https://doi.org/10.1038/s41467-024-51891-9

ORCID

0000-0003-0178-1876 (Li)

Li-SuppInfo.pdf (11991 kB)
Supplementary information

Li-PeerReview.pdf (1196 kB)
Peer Review File

Li-Description-SuppFiles.pdf (95 kB)
Description of Additional Supplementary Files

Li-SuppData.xlsx (129 kB)
Supplementary Dataset 1-11

Li-ReportingSummary.pdf (332 kB)
Reporting Summary

Share

COinS