ORCID

0000-0001-7702-2564 (Moudden)

Document Type

Article

Publication Date

2025

DOI

10.19139/soic-2310-5070-2525

Publication Title

Statistics, Optimization & Information Computing

Volume

Advance online publication

Pages

19 pp.

Abstract

Background: Cervical cancer remains the fourth most common cancer in women globally, with 604,000 new cases annually. Early detection through cytological screening is critical, but manual interpretation suffers from high false negative rates and requires expert pathologists often unavailable in resource-limited settings.

Methods: We developed a novel hybrid framework combining InceptionV3-based deep feature extraction with Gini Index feature selection for automated cervical cancer cell classification. Using the Herlev dataset (917 Pap smear images: 242 normal, 675 abnormal), we extracted 2048 deep features and applied systematic feature selection to identify optimal discriminative subsets. Comprehensive clustering analysis (K-means, K-medoid, Fuzzy clustering) validated binary classification approaches. Multiple classifiers (Random Forest, kNN, Decision Tree, AdaBoost, ANN) were evaluated using stratified 100-5-fold cross-validation with rigorous statistical validation including power analysis, bootstrap confidence intervals, and multiple comparison corrections.

Results: Random Forest achieved optimal performance with 99.8% accuracy using only 5 selected features, a 400-fold reduction from original feature dimensionality while maintaining equivalent performance to methods using 20+ features. Clinical error analysis revealed 0.9% false negative rate (6/675 missed cancers) and 0.1% false positive rate (2/242 unneccessary referrals), both substantially lower than documented manual screening benchmarks. Comprehensive clustering analysis confirmed optimal binary classification with 2 clusters explaining 65.34% of variance. Statistical significance testing demonstrated equivalent performance to best existing methods (p > 0.05) with superior computational efficiency.

Conclusions: Our framework achieves state-of-art cervical cancer classification accuracy while dramatically reducing computational requirements through intelligent feature selection. The 5-feature requirement enables real-time deployment (< 0.1 seconds/image) on standard clinical hardware, addressing critical implementation barriers in resource-constrained environments. Superior error rates compared to manual screening, combined with objective performance metrics, support integration into automated workflows for improved cervical cancer detection globally.

Rights

© 2025 The Authors and International Academic Press.

This work is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Original Publication Citation

Assawab, R., Ouzir, M., Benyacoub, B., El Allati, A., & El Moudden, I. (2025). Medical image feature extraction and selection based on Inception V3 and Gini Index for cervical cancer cells identification. Statistics, Optimization & Information Computing. Advance online publication. https://doi.org/10.19139/soic-2310-5070-2525

Share

COinS