학술논문

Genetic Algorithm for the Mutual Information-Based Feature Selection in Univariate Time Series Data
Document Type
article
Source
IEEE Access, Vol 8, Pp 9597-9609 (2020)
Subject
Feature selection
genetic algorithm
machine learning
deep learning
optimization methods
forecasting
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Language
English
ISSN
2169-3536
Abstract
Filters are the fastest among the different types of feature selection methods. They employ metrics from information theory, such as mutual information (MI), Joint-MI (JMI), and minimal redundancy and maximal relevance (mRMR). The determination of the optimal feature selection set is an NP-hard problem. This work proposes the engineering of the Genetic Algorithm (GA) in which the fitness of solutions consists of two terms. The first is a feature selection metric such as MI, JMI, and mRMR, and the second term is the overlapping-coefficient that accounts for the diversity in the GA population. Experimental results show that the proposed algorithm can return multiple good quality solutions that also have minimal overlap with each other. Numerous solutions provide significant benefits when the test data contains none or missing values. Experiments were conducted using two publicly available time-series datasets. The feature sets are also applied to perform forecasting using a simple Long Short-Term Memory (LSTM) model, and the solution quality of the forecasting using different feature sets is analyzed. The proposed algorithm was compared with a popular optimization tool `Basic Open-source Nonlinear Mixed INteger programming' (BONMIN), and a recent feature selection algorithm `Conditional Mutual Information Considering Feature Interaction' (CMFSI). The experiments show that the multiple solutions found by the proposed method have good quality and minimal overlap.