학술논문

Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem.
Document Type
Article
Source
Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ). Aug2022, Vol. 47 Issue 8, p9741-9754. 14p.
Subject
*CREDIT card fraud
*CLASSIFICATION algorithms
*NUCLEAR explosions
*TELEPHONE calls
*OIL spills
Language
ISSN
2193-567X
Abstract
The class imbalance problem (CIP) has become a hot topic of machine learning in recent years because of its increasing importance in today's era. As the application area of technology is increases, the size and variety of data also increases. By nature, most of the real-world raw data is present in imbalanced form like credit card frauds, fraudulent telephone calls, shuttle system failure, text classification, nuclear explosions, oil spill detection, detection of brain tumor images etc. The classification algorithms are not able to classify imbalance data accurately and their results always deviate toward the bigger class. This problem is known as Class Imbalance Problem. This paper assess various data level methods which are used to balance the data before classification. It also discusses various characteristics of data which impact class imbalance problem and the reasons why traditional classification algorithms are not able to tackle this issue. Apart from this it also discusses about other data abnormalities which makes the CIP more critical like size of data, overlapping classes, presence of noise in the data, data distribution within each class etc. The paper empirically compared 20 data-level classification methods with 44 UCI real imbalanced data-sets with the imbalance ratio ranging from as low as to 1.82 to as high as to 129.44 using KEEL tool. The performance of the methods is assessed using AUC, F-measure, G-mean metrics and the results are analyzed and represented graphically. [ABSTRACT FROM AUTHOR]