학술논문

Parallel feature selection based on the trace ratio criterion

Document Type

Conference

Author

Nguyen, Thu; Phan, Nhan; Nguyen, Nhuong; Nguyen, Binh T.; Halvorsen, Pal; Riegler, Michael A.

Source

2022 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2022 International Joint Conference on. :1-8 Jul, 2022

Subject

Bioengineering
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Costs
Neural networks
Memory
Feature extraction
Linear discriminant analysis
Classification algorithms
feature selection
classification
trace ratio

Language

ISSN

2161-4407

Abstract

The growth of data today poses a challenge in management and inference. While feature extraction methods are capable of reducing the size of the data for inference, they do not help in minimizing the cost of data storage. On the other hand, feature selection helps to remove the redundant features and therefore is helpful not only in inference but also in reducing management costs. This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST), which scales up to very large datasets. Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness. We analyzed the criterion's desirable properties theoretically. Based on the criterion, PFST rapidly finds important features out of a set of features for big datasets by first making a forward selection with early removal of seemingly redundant features parallelly. After the most important features are included in the model, we check back their contribution for possible interaction that may improve the fit. Lastly, we make a backward selection to check back possible redundant added by the forward steps. We evaluate our methods via various experiments using Linear Discriminant Analysis as the classifier on selected features. The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison. In addition, the classifier trained on the features selected by PFST not only achieves better accuracy than the ones chosen by other approaches, but can also achieve better accuracy than the classification on all available features.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송