학술논문

Ensemble Feature Selection: Are Stability Metrics a Proxy or a Complement to Predictive Performance?
Document Type
Conference
Source
2021 13th International Conference on Bioinformatics and Biomedical Technology. :108-115
Subject
ensemble methods
feature selection
feature subsets merging
performance
stability metrics
Language
English
Abstract
Proper identification of biomarkers, used in the development of drugs, is critical as has been shown with the race to find a vaccine for the Covid19. Gene-expression based marker discovery often entails that feature selection be performed. However, a plethora of feature selection methods exist and they do not result in the selection of the same feature subsets for the same dataset. Often, users are faced with having to select which subset to use. To help in this conundrum, several approaches have been proposed to guide feature subset selection, among which the use of ensemble methods (i.e., combining subsets from multiple methods) has gained attention recently. In an ensemble approach there are two issues that deserve attention: the stability of the feature subsets being combined and the classification performance of the combined feature subsets. Hence the interest in exploring how stability and performance relate, which is the central topic investigated in this paper. First 5/6 different feature selection methods are used to create feature subsets for 3 different transcriptomics datasets. Then, the stability and performance of these feature subsets under a given merging strategy are computed using 5 stability metrics and 3 performance metrics for 3 different classifiers. Our results suggest that performance and stability criteria are complementary and conflicting and that both must be considered to decide on the final selected feature subsets. We use two reference metrics to illustrate such selection.

Online Access