학술논문

Feature Selection in High-Dimensional Space with Applications to Gene Expression Data
Document Type
Conference
Source
SoutheastCon 2024 SoutheastCon, 2024. :6-15 Mar, 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Machine learning algorithms
Computational modeling
NASA
Predictive models
Metadata
Feature extraction
Gene expression
Language
ISSN
1558-058X
Abstract
Recent years have seen rapid growth in high-dimensional datasets. Most existing machine learning (ML) algorithms fail in high-dimensional settings where many features could be redundant. A critical process of feature selection is thus applied in such a setting that helps in identifying the most relevant features while removing redundant ones. With the increase in high dimensionality, one is also faced with problems of efficiency and interpretation in performing such selection methods. Therefore, this paper proposes a “novel” feature selection framework that uses an ensemble of interpretable ML algorithms to perform feature selection and the ranking of final features. Finally, this framework is applied to a gene expression dataset obtained through collaboration with the National Aeronautics and Space Administration (NASA)'s Biological and Physical Sciences (BPS) team and helps identify important and relevant genes contributing to specific target attributes through classification tasks.