Influence of Gender-Specific Data Imbalance on scGPT Fine-Tuning for Single-Cell Genomics
Document Type
Conference Paper
Publication Date
2025
DOI
10.1609/aaaiss.v7i1.36914
Publication Title
Proceedings of the AAAI Symposium Series
Volume
7
Issue
1
Pages
420-427
Conference Name
AAAI 2025 Fall Symposium Series, November 6-8, 2025, Arlington, Virginia
Abstract
The transformer-based foundation model scGPT has demonstrated strong capabilities in analyzing high-dimensional single-cell RNA sequencing data. However, the impact of demographic factors, particularly gender, on model performance remains insufficiently understood. Gender is known to influence cell type compositions in the immune system. Here, using the gender-sensitive cell type composition in immune system, we comprehensively evaluate how the gender-sensitive imbalance of training data influences the performance of scGPT in cell type predictions. We fine-tune scGPT on male-only, female-only, and mixed-gender subsets from two large-scale datasets containing immune cells. We use a logit difference to measure the confidence gap between the true label and the actual model prediction. The confidence gap is zero for perfect classifications and negative for incorrect predictions. We find that training and testing configurations with aligned gender distributions generally show higher prediction confidence, while mismatched gender during training and testing, especially when training excludes one gender, leads to substantial confidence drops. We also find that training with mixed-gender data promotes more balanced generalization, but does not eliminate all biases. We conclude that gender-specific data imbalance, represented by immune cell type subpopulation variation between women and men, can influence fine-tuning of scGPT and its performance in cell type classification, highlighting the importance of addressing such demographic biases in biomedical AI models.
Rights
© 2023, Association for the Advancement of Artificial Intelligence. All rights reserved.
"In the Returned Rights section of the AAAI copyright form, authors are specifically granted back the right to use their own papers for noncommercial uses, such as inclusion in their dissertations or the right to deposit their own papers in their institutional repositories, provided there is proper attribution. The published version is not available for posting outside the AAAI Digital Library."
Original Publication Citation
Al Amin, M. A. U., Filienko, D., & Qin, H. (2025). Influence of gender-specific data imbalance on scGPT fine-tuning for single-cell genomics. Proceedings of the AAAI Symposium Series, 7(1), 420-427. https://doi.org/10.1609/aaaiss.v7i1.36914
Repository Citation
Al Amin, M. A. U., Filienko, D., & Qin, H. (2025). Influence of gender-specific data imbalance on scGPT fine-tuning for single-cell genomics. Proceedings of the AAAI Symposium Series, 7(1), 420-427. https://doi.org/10.1609/aaaiss.v7i1.36914
ORCID
0000-0002-1060-6722 (Qin)