ORCID

0000-0003-4162-0276 (Colen)

Document Type

Article

Publication Date

2025

DOI

10.21203/rs.3.rs-7350820/v1

Publication Title

Research Square

Pages

21 pp.

Abstract

Radiomics-based machine learning models have the potential to detect lung cancer at inception from CT scans and transform patient outcomes. Low malignancy rates in early-development pulmonary nodules (PNs) and variable image acquisition hinder development of clinically applicable radiomics-based early detection models. To address these challenges, we augmented training using later-development PNs and harmonized for acquisition effects. We first trained machine learning models to predict PN malignancy using radiomic features from scans of early-development benign and malignant PNs (n = 187) harmonized using ComBat. Observing near-chance performance, we augmented training with later-development benign and malignant PNs (n = 225). We evaluated whether harmonization must incorporate biological differences that impact acquisition effects in added training data. To correct features for variability in four acquisition parameters, we compared: 1) harmonization without biological distinction, 2) harmonizing with a covariate distinguishing early-development, benign augmentation, malignant augmentation training datasets, 3) harmonizing each dataset separately. Models trained using augmented data harmonized without biological distinction failed to improve. Models trained on augmented data harmonized with a covariate (ROC-AUC 0.72 [0.67–0.76]) or separately (ROC-AUC 0.69 [0.63–0.74]) achieved significantly higher test ROC-AUC (Delong test, adjusted p ≤ 0.05). Our findings lay groundwork for clinically viable radiomics tools harnessing routine screening imaging for lung cancer early detection.

Rights

© 2025 The Authors.

This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Data Availability

Article states: "The imaging datasets used in this study are currently restricted from public release due to data privacy laws and institutional review board policy. Anonymized radiomic feature data, acquisition parameters, and diagnostic data which were analyzed in this study are available from the corresponding author on reasonable request."

Comments

This is a preprint, it has not been peer reviewed by a journal.

Under Revision at Scientific Reports.

Original Publication Citation

Huchthausen, C., Shi, M., Sousa, G. L., Larner, J., Janowski, E., Colen, J., & Wijesooriya, K. (2025). Training set augmentation and harmonization enables radiomic models to detect early onset of lung cancer. Research Square. https://doi.org/10.21203/rs.3.rs-7350820/v1

Share

 
COinS