ORCID
0000-0003-4162-0276 (Colen)
Document Type
Article
Publication Date
2025
DOI
10.21203/rs.3.rs-7350820/v1
Publication Title
Research Square
Pages
21 pp.
Abstract
Radiomics-based machine learning models have the potential to detect lung cancer at inception from CT scans and transform patient outcomes. Low malignancy rates in early-development pulmonary nodules (PNs) and variable image acquisition hinder development of clinically applicable radiomics-based early detection models. To address these challenges, we augmented training using later-development PNs and harmonized for acquisition effects. We first trained machine learning models to predict PN malignancy using radiomic features from scans of early-development benign and malignant PNs (n = 187) harmonized using ComBat. Observing near-chance performance, we augmented training with later-development benign and malignant PNs (n = 225). We evaluated whether harmonization must incorporate biological differences that impact acquisition effects in added training data. To correct features for variability in four acquisition parameters, we compared: 1) harmonization without biological distinction, 2) harmonizing with a covariate distinguishing early-development, benign augmentation, malignant augmentation training datasets, 3) harmonizing each dataset separately. Models trained using augmented data harmonized without biological distinction failed to improve. Models trained on augmented data harmonized with a covariate (ROC-AUC 0.72 [0.67–0.76]) or separately (ROC-AUC 0.69 [0.63–0.74]) achieved significantly higher test ROC-AUC (Delong test, adjusted p ≤ 0.05). Our findings lay groundwork for clinically viable radiomics tools harnessing routine screening imaging for lung cancer early detection.
Rights
© 2025 The Authors.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Data Availability
Article states: "The imaging datasets used in this study are currently restricted from public release due to data privacy laws and institutional review board policy. Anonymized radiomic feature data, acquisition parameters, and diagnostic data which were analyzed in this study are available from the corresponding author on reasonable request."
Original Publication Citation
Huchthausen, C., Shi, M., Sousa, G. L., Larner, J., Janowski, E., Colen, J., & Wijesooriya, K. (2025). Training set augmentation and harmonization enables radiomic models to detect early onset of lung cancer. Research Square. https://doi.org/10.21203/rs.3.rs-7350820/v1
Repository Citation
Huchthausen, C., Shi, M., Sousa, G. L., Larner, J., Janowski, E., Colen, J., & Wijesooriya, K. (2025). Training set augmentation and harmonization enables radiomic models to detect early onset of lung cancer. Research Square. https://doi.org/10.21203/rs.3.rs-7350820/v1
Comments
This is a preprint, it has not been peer reviewed by a journal.
Under Revision at Scientific Reports.