학술논문

Online Learning From Incomplete and Imbalanced Data Streams

Document Type

Periodical

Author

You, D.; Xiao, J.; Wang, Y.; Yan, H.; Wu, D.; Chen, Z.; Shen, L.; Wu, X.

Source

IEEE Transactions on Knowledge and Data Engineering IEEE Trans. Knowl. Data Eng. Knowledge and Data Engineering, IEEE Transactions on. 35(10):10650-10665 Oct, 2023

Subject

Computing and Processing
Heuristic algorithms
Real-time systems
Costs
Data mining
Classification algorithms
Optimization
Aerospace electronics
Data streams
F-measure
incomplete feature spaces
imbalanced data
online learning

Language

ISSN

1041-4347
1558-2191
2326-3865

Abstract

Learning with streaming data has attracted extensive research interest in recent years. Existing online learning approaches have specific assumptions regarding data streams, such as requiring fixed or varying feature spaces with explicit patterns and balanced class distributions. While the data streams generated in many real scenarios commonly have arbitrarily incomplete feature spaces and dynamic imbalanced class distributions, making existing approaches be unsuitable for real applications. To address this issue, this paper proposes a novel Online Learning from Incomplete and Imbalanced Data Streams (OLI $^{2}$2 DS) algorithm. OLI $^{2}$2 DS has a two-fold main idea: 1) it follows the empirical risk minimization principle to identify the most informative features of incomplete feature spaces, and 2) it develops a dynamic cost strategy to handle imbalanced class distributions in real-time by transforming F-measure optimization into a weighted surrogate loss minimization. To evaluate OLI $^{2}$2 DS, we compare it with state-of-the-art related algorithms in three kinds of experiments. First, we adopt 14 real datasets to simulate three scenarios of incomplete feature spaces, i.e., trapezoidal, feature evolvable, and capricious data streams. Second, based on a benchmark online analyzer, we generate 13 datasets to simulate incomplete data streams with different imbalance ratios. Third, we analyze concept drift in two simulated scenes, i.e., online learning and data stream mining, and verify the adaption of OLI $^{2}$2 DS on repeated concept drifts and variable imbalance ratios. The results demonstrate that OLI $^{2}$2 DS achieves a significantly better performance than its rivals. Besides, a real-world case study on movie review classification is conducted to elaborate on our OLI $^{2}$2 DS algorithm's effectiveness. Code is released at https://github.com/youdianlong/OLI2DS.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송