학술논문

Feature selection using non-dominant features-guided search for gene expression profile data
Document Type
Academic Journal
Source
Complex & Intelligent Systems. December, 2023, Vol. 9 Issue 6, p6139, 15 p.
Subject
Algorithm
Gene expression -- Analysis -- Models
Genes -- Analysis -- Models
Algorithms -- Analysis -- Models
Language
English
Abstract
Gene expression profile data have high-dimensionality with a small number of samples. These data characteristics lead to a long training time and low performance in predictive model construction. To address this issue, the paper proposes a feature selection algorithm using non-dominant feature-guide search. The algorithm adopts a filtering framework based on feature sorting and search strategy to overcome the problems of long training time and poor performance. First, the feature pre-selection is completed according to the calculated feature category correlation. Second, a multi-objective optimization feature selection model is constructed. Non-dominant features are defined according to the Pareto dominance theory. Combined with the bidirectional search strategy, the Pareto dominance features under the current category maximum relevance feature are removed one by one. Finally, the optimal feature subset with maximum correlation and minimum redundancy is obtained. Experimental results on six gene expression data sets show that the algorithm is much better than Fisher score, maximum information coefficient, composition of feature relevancy, mini-batch K-means normalized mutual information feature inclusion, and max-Relevance and Min-Redundancy algorithms. Compared to feature selection method based on maximum information coefficient and approximate Markov blanket, the algorithm not only has high computational efficiency but also can obtain better classification capabilities in a smaller dimension.
Author(s): Xiaoying Pan [sup.1], Jun Sun [sup.2], Huimin Yu [sup.2], Yufeng Xue [sup.2] Author Affiliations: (1) https://ror.org/04jn0td46, grid.464492.9, 0000 0001 0158 6320, Shaanxi Key Laboratory of Network Data Analysis and [...]