학술논문

Hypersparse Network Flow Analysis of Packets with GraphBLAS
Document Type
Conference
Source
2022 IEEE High Performance Extreme Computing Conference (HPEC) High Performance Extreme Computing Conference (HPEC), 2022 IEEE. :1-7 Sep, 2022
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Data privacy
Aggregates
Instruction sets
Telecommunication traffic
Internet
Information filtering
Sparse matrices
network analyses
compression
streaming graphs
hypersparse matrices
Language
ISSN
2643-1971
Abstract
Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multi-temporal spatial analyses are then performed on each sub range to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (