Document Type

Article

Publication Date

2016

Publication Title

Journal of Digital Information Management (JDIM)

Volume

14

Issue

2

Pages

92-103

Abstract

Classification of imbalanced data has been recognized as a crucial problem in machine learning and data mining. In an imbalanced dataset, minority class instances are likely to be misclassified. When the synthetic minority over-sampling technique (SMOTE) is applied in imbalanced dataset classification, the same sampling rate is set for all samples of the minority class in the process of synthesizing new samples, this scenario involves blindness. To overcome this problem, an improved SMOTE algorithm based on genetic algorithm (GA), namely, GASMOTE was proposed. First, GASMOTE set different sampling rates for different minority class samples. A combination of the sampling rates corresponded to an individual in the population. Second, the selection, crossover, and mutation operators of GA were iteratively applied to the population to obtain the best combination of sampling rates when the stopping criteria were met. Lastly, the best combination of sampling rates was used in SMOTE to synthetize new samples. Experimental results on 10 typical imbalanced datasets show that GASMOTE increases the F-measure value by 5.9% and the G-mean value by 1.6% compared with the SMOTE algorithm. Meanwhile, GASMOTE increases the F-measure value by 3.7% and the G-mean value by 2.3% compared with the borderline-SMOTE algorithm. GASMOTE can be utilized as a new over-sampling technique to address the problem of imbalanced dataset classification. The GASMOTE algorithm can be then adopted in a practical engineering application, namely, prediction of rockburst in VCR rockburst datasets. The experimental results indicate that the GASMOTE algorithm can accurately predict the rockburst occurrence and thus provides guidance to the design and construction of safe deep-mining engineering structures.

Rights

Copyright © 2016 Journal of Digital Information Management (JDIM).

"The authors can host the copy of their papers in their home pages and institutional repositories after one month of the publication of the papers. The source of publication of the article must be clearly stated and the journal name including the imprint should be available in each page of the published papers. The authors can host the electronic version for unlimited period. However, the authors are not permitted to distribute the print version of the papers to any one."

Included in accordance with publisher policy.

Original Publication Citation

Gu, Q., Wang, X.-M., Wu, Z., Ning, B., & Xin, C.-S. (2016). An improved SMOTE algorithm based on genetic algorithm for imbalanced data collection. Journal of Digital Information Management (JDIM), 14(2), 92-103. https://www.dline.info/fpaper/jdim/v14i2/jdimv14i2_3.pdf

Share

COinS