학술논문

Visual Structural Assessment and Anomaly Detection for High-Velocity Data Streams
Document Type
Periodical
Source
IEEE Transactions on Cybernetics IEEE Trans. Cybern. Cybernetics, IEEE Transactions on. 51(12):5979-5992 Dec, 2021
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Robotics and Control Systems
General Topics for Engineers
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Streaming media
Clustering algorithms
Data visualization
Microsoft Windows
Visualization
Data models
Heating systems
Big data
change detection
Internet of Things (IoT)
streaming data
visual assessment of tendency
visual cluster footprint
Language
ISSN
2168-2267
2168-2275
Abstract
The widespread use of Internet-of-Things (IoT) technologies, smartphones, and social media services generates huge amounts of data streaming at high velocity. Automatic interpretation of these rapidly arriving data streams is required for the timely detection of interesting events that usually emerge in the form of clusters. This article proposes a new relative of the visual assessment of the cluster tendency (VAT) model, which produces a record of structural evolution in the data stream by building a cluster heat map of the entire processing history in the stream. The existing VAT-based algorithms for streaming data, called inc-VAT/inc-iVAT and dec-VAT/dec-iVAT, are not suitable for high-velocity and high-volume streaming data because of high memory requirements and slower processing speed as the accumulated data increases. The scalable iVAT (siVAT) algorithm can handle big batch data, but for streaming data, it needs to be (re)applied everytime a new datapoint arrives, which is not feasible due to the associated computation complexities. To address this problem, we propose an incremental siVAT algorithm, called inc-siVAT, which deals with the streaming data in chunks. It first extracts a small size smart sample using an intelligent sampling scheme, called maximin random sampling (MMRS), then incrementally updates the smart sample points on the fly, using our novel incremental MMRS (inc-MMRS) algorithm, to reflect changes in the data stream after each chunk is processed, and finally, produces an incrementally built iVAT image of the updated smart sample, using the inc-VAT/inc-iVAT and dec-VAT/dec-iVAT algorithms. These images can be used to visualize the evolving cluster structure and for anomaly detection in streaming data. Our method is illustrated with one synthetic and four real datasets, two of which evolve significantly over time. Our numerical experiments demonstrate the algorithm’s ability to successfully identify anomalies and visualize changing cluster structure in streaming data.