학술논문

Feature Selection Based on Intrusive Outliers Rather Than All Instances
Document Type
Periodical
Source
IEEE Transactions on Image Processing IEEE Trans. on Image Process. Image Processing, IEEE Transactions on. 33:809-824 2024
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Feature extraction
Training
Task analysis
Solid modeling
Mutual information
Measurement
Face recognition
Supervised feature selection
intrusive outlier
density-mean center
overlapping class
classification
Language
ISSN
1057-7149
1941-0042
Abstract
Feature selection (FS) has recently attracted considerable attention in many fields. Highly-overlapping classes and skewed distributions of data within classes have been found in various classification tasks. Most existing FS methods are all instance-based, which ignores the significant differences in characteristics between the particular outliers and the main body of the class, causing confusion for classifiers. In this paper, we propose a novel supervised FS method, Intrusive Outliers-based Feature Selection (IOFS), to find out what kind of outliers lead to misclassification and exploit the characteristics of such outliers. In order to accurately identify the intrusive outliers (IOs), we provide a density-mean center algorithm to obtain the appropriate representative of a class. A special distance threshold is given to obtain the candidate for IOs. Combining with several metrics, mathematical formulations are provided to evaluate the overlapping degree of the intrusive class pairs. Features with high overlapping degrees are assigned to low rankings in IOFS method. An extension of IOFS based on a small number of extreme IOs, called E-IOFS, is also proposed. Three theoretical proofs are provided for the essential theoretical basis of IOFS. Experiments comparing against various state-of-the-art methods on eleven benchmark datasets show that IOFS is rational and effective, especially on the datasets with higher overlapping classes. And E-IOFS almost always outperforms IOFS.