Abstract/Description/Artist Statement

The transformer-based foundation model scGPT has demonstrated strong capabilities in analyzing high-dimensional single-cell RNA sequencing data. However, the impact of demographic factors, particularly gender, on model performance remains insufficiently understood. Gender is known to influence cell-type compositions in the immune system. Here, using the gender-sensitive cell-type composition in immune system, we comprehensively evaluated how the gender-sensitive imbalance of training data influences the performance of scGPT in cell-type predictions. We fine-tuned scGPT on male-only, female-only, and mixed-gender subsets from two large-scale datasets containing immune cells. We used a logit difference to measure the confidence gap between the true label and the actual model prediction. The confidence gap is zero for perfect classifications and negative for incorrect predictions. We observed that training and testing configurations with aligned gender distributions generally showed higher prediction confidence, while mismatched gender during training and testing, especially when training excludes one gender, leads to substantial confidence drops. We also found that training with mixed-gender data promoted more balanced generalization, but did not eliminate all biases. We conclude that gender-specific data imbalance, represented by immune cell-type subpopulation variation between women and men, can influence fine-tuning of scGPT and its performance in cell-type classification, highlighting the importance of addressing such demographic biases in biomedical AI models.

Presenting Author Name/s

Ashish Acharya

Faculty Advisor/Mentor

Hong Qin

Faculty Advisor/Mentor Email

hqin@odu.edu

Faculty Advisor/Mentor Department

Computer Science

College/School Affiliation

Batten College of Engineering & Technology

Student Level Group

Graduate/Professional

Presentation Type

Poster

Share

COinS
 

Influence of Gender-Specific Data Imbalance on scGPT Fine-Tuning for Single-Cell Genomics

The transformer-based foundation model scGPT has demonstrated strong capabilities in analyzing high-dimensional single-cell RNA sequencing data. However, the impact of demographic factors, particularly gender, on model performance remains insufficiently understood. Gender is known to influence cell-type compositions in the immune system. Here, using the gender-sensitive cell-type composition in immune system, we comprehensively evaluated how the gender-sensitive imbalance of training data influences the performance of scGPT in cell-type predictions. We fine-tuned scGPT on male-only, female-only, and mixed-gender subsets from two large-scale datasets containing immune cells. We used a logit difference to measure the confidence gap between the true label and the actual model prediction. The confidence gap is zero for perfect classifications and negative for incorrect predictions. We observed that training and testing configurations with aligned gender distributions generally showed higher prediction confidence, while mismatched gender during training and testing, especially when training excludes one gender, leads to substantial confidence drops. We also found that training with mixed-gender data promoted more balanced generalization, but did not eliminate all biases. We conclude that gender-specific data imbalance, represented by immune cell-type subpopulation variation between women and men, can influence fine-tuning of scGPT and its performance in cell-type classification, highlighting the importance of addressing such demographic biases in biomedical AI models.