학술논문

Detecting Overlapping Areas in Unbalanced High-Dimensional Data Using Neighborhood Rough Set and Genetic Programming
Document Type
Periodical
Source
IEEE Transactions on Evolutionary Computation IEEE Trans. Evol. Computat. Evolutionary Computation, IEEE Transactions on. 27(4):1130-1144 Aug, 2023
Subject
Computing and Processing
Costs
Task analysis
Training
Support vector machines
Sampling methods
Genetic programming
Rough sets
Class overlap
genetic programming (GP)
high dimensionality
rough sets
unbalanced classification
Language
ISSN
1089-778X
1941-0026
Abstract
Unbalanced classification has attracted widespread interest because of its broad applications. However, due to mainly the uneven class distribution, constructed classifiers are usually biased toward the majority class, and thereby perform terribly on the minority class. Unfortunately, the minority class is often the class of interest in many real-world applications. High dimensionality often further degrades the classification performance, making it more complicated to address the class imbalance issue. Genetic programming (GP) has been applied to construct classifiers, which can simultaneously select good-quality features to improve the classification performance. To handle the class imbalance issue, cost-sensitive GP classifiers treat the minority class as being more important than the majority class, but this may cause an accuracy decrease in overlapping areas where the prior probabilities of the two classes are almost the same. To date, most cost-sensitive classification methods have not been specifically investigated how the impacts of overlapping areas on cost-sensitive classifiers can be avoided. In this study, we propose a new cost-sensitive GP method, where rough set theory is employed to detect overlapping areas before training cost-sensitive classifiers for classification with unbalanced high-dimensional data. The proposed method is compared with 46 popular classification methods, including 10 GP methods and 36 non-GP methods on 14 datasets that are unbalanced and high dimensional. The experimental results indicate that the proposed method performs better than the compared methods in almost all cases.