학술논문

Reduced-Space Multistream Classification Based on Multiobjective Evolutionary Optimization
Document Type
Periodical
Source
IEEE Transactions on Evolutionary Computation IEEE Trans. Evol. Computat. Evolutionary Computation, IEEE Transactions on. 27(4):764-777 Aug, 2023
Subject
Computing and Processing
Feature extraction
Data models
Adaptation models
Optimization
Predictive models
Buildings
Labeling
Concept drift
domain adaption
feature selection
multiobjective optimization
multistream classification
Language
ISSN
1089-778X
1941-0026
Abstract
In traditional data stream mining, classification models are typically trained on labeled samples from a single source. However, in real-world scenarios, obtaining accurate labels is very hard and expensive, especially, when multiple data streams are concurrently sampled from an environment or the same process. To address this issue, multistream classification is proposed, in which a data stream with biased labels (called the source stream) is leveraged to train a suitable model for prediction over another stream with unlabeled samples (called the target stream). Despite the growing research in this field, previous multistream classification methods are mostly designed for single-source stream scenarios. However, various source streams contain diverse data distributions, providing more valuable information for building a more accurate model. In addition, previous works construct classification models in the original shared feature space, ignoring the effect of redundant or low-quality features on the classification performance. This may produce inefficient knowledge transfer across streams. In view of this, a reduced-space multistream classification based on multiobjective evolutionary optimization is proposed in this article. First, a multiobjective evolutionary optimization is employed to seek the most valuable feature subset shared in the source and target domains, with the purpose of narrowing the distribution difference between source and target streams. Following that, a Gaussian mixture model-based weighting mechanism for source samples is presented. More especially, two drift adaptation methods are proposed to address asynchronous drift. Experimental results on benchmark datasets show that the proposed method outperforms other comparative methods on classification accuracy and G-mean.