Document Type
Article
Publication Date
2016
Publication Title
Journal of Digital Information Management (JDIM)
Volume
14
Issue
2
Pages
92-103
Abstract
Classification of imbalanced data has been recognized as a crucial problem in machine learning and data mining. In an imbalanced dataset, minority class instances are likely to be misclassified. When the synthetic minority over-sampling technique (SMOTE) is applied in imbalanced dataset classification, the same sampling rate is set for all samples of the minority class in the process of synthesizing new samples, this scenario involves blindness. To overcome this problem, an improved SMOTE algorithm based on genetic algorithm (GA), namely, GASMOTE was proposed. First, GASMOTE set different sampling rates for different minority class samples. A combination of the sampling rates corresponded to an individual in the population. Second, the selection, crossover, and mutation operators of GA were iteratively applied to the population to obtain the best combination of sampling rates when the stopping criteria were met. Lastly, the best combination of sampling rates was used in SMOTE to synthetize new samples. Experimental results on 10 typical imbalanced datasets show that GASMOTE increases the F-measure value by 5.9% and the G-mean value by 1.6% compared with the SMOTE algorithm. Meanwhile, GASMOTE increases the F-measure value by 3.7% and the G-mean value by 2.3% compared with the borderline-SMOTE algorithm. GASMOTE can be utilized as a new over-sampling technique to address the problem of imbalanced dataset classification. The GASMOTE algorithm can be then adopted in a practical engineering application, namely, prediction of rockburst in VCR rockburst datasets. The experimental results indicate that the GASMOTE algorithm can accurately predict the rockburst occurrence and thus provides guidance to the design and construction of safe deep-mining engineering structures.
Rights
Copyright © 2016 Journal of Digital Information Management (JDIM).
"The authors can host the copy of their papers in their home pages and institutional repositories after one month of the publication of the papers. The source of publication of the article must be clearly stated and the journal name including the imprint should be available in each page of the published papers. The authors can host the electronic version for unlimited period. However, the authors are not permitted to distribute the print version of the papers to any one."
Included in accordance with publisher policy.
Original Publication Citation
Gu, Q., Wang, X.-M., Wu, Z., Ning, B., & Xin, C.-S. (2016). An improved SMOTE algorithm based on genetic algorithm for imbalanced data collection. Journal of Digital Information Management (JDIM), 14(2), 92-103. https://www.dline.info/fpaper/jdim/v14i2/jdimv14i2_3.pdf
Repository Citation
Gu, Qiong; Wang, Xian-Ming; Wu, Zhao; Ning, Bing; and Xin, Chun-Sheng, "An Improved SMOTE Algorithm Based on Genetic Algorithm for Imbalanced Data Collection" (2016). Electrical & Computer Engineering Faculty Publications. 462.
https://digitalcommons.odu.edu/ece_fac_pubs/462
Included in
Artificial Intelligence and Robotics Commons, Data Science Commons, Electrical and Computer Engineering Commons, Theory and Algorithms Commons