Document Type


Publication Date




Publication Title







540 (1-21)


Purpose: To assess the efficacy of various machine learning (ML) algorithms in predicting late-stage colorectal cancer (CRC) diagnoses against the backdrop of socio-economic and regional healthcare disparities. Methods: An innovative theoretical framework was developed to integrate individual- and census tract-level social determinants of health (SDOH) with sociodemographic factors. A comparative analysis of the ML models was conducted using key performance metrics such as AUC-ROC to evaluate their predictive accuracy. Spatio-temporal analysis was used to identify disparities in late-stage CRC diagnosis probabilities. Results: Gradient boosting emerged as the superior model, with the top predictors for late-stage CRC diagnosis being anatomic site, year of diagnosis, age, proximity to superfund sites, and primary payer. Spatio-temporal clusters highlighted geographic areas with a statistically significant high probability of late-stage diagnoses, emphasizing the need for targeted healthcare interventions. Conclusions: This research underlines the potential of ML in enhancing the prognostic predictions in oncology, particularly in CRC. The gradient boosting model, with its robust performance, holds promise for deployment in healthcare systems to aid early detection and formulate localized cancer prevention strategies. The study’s methodology demonstrates a significant step toward utilizing AI in public health to mitigate disparities and improve cancer care outcomes.


© 2024 by the authors.

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Data Availability

Article states: "Data are contained within the article and Supplementary Materials."

Original Publication Citation

Galadima, H., Anson-Dwamena, R., Johnson, A., Bello, G., Adunlin, G., & Blando, J. (2024). Machine learning as a tool for early detection: A focus on late-stage colorectal cancer across socioeconomic spectrums. Cancers, 16(3), 1-21, Article 540.


0000-0003-1588-3929 (Galadima), 0000-0001-5619-499X (Blando)

cancers-2825773-supplementary.pdf (301 kB)
Table S1: Baseline Characteristics by State of Diagnosis Status, Table S2: Neighborhood Census Tracts Characterized by Stage of Diagnosis Status