KOR

e-Article

Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic
Document Type
Conference
Source
2022 IEEE High Performance Extreme Computing Conference (HPEC) High Performance Extreme Computing Conference (HPEC), 2022 IEEE. :1-7 Sep, 2022
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Observatories
Statistical analysis
Computational modeling
Telecommunication traffic
Sensor phenomena and characterization
Telescopes
Metadata
Cybersecurity
High Performing Computing
Big Data
Networks Scanning
Dimensional Analysis
Internet Modeling
Packet Capture
Streaming Graphs
Language
ISSN
2643-1971
Abstract
Modern network sensors continuously produce enormous quantities of raw data that are beyond the capacity of human analysts. Cross-correlation of network sensors increases this challenge by enriching every network event with additional metadata. These large volumes of enriched network data present opportunities to statistically characterize network traffic and quickly answer a key question: “What are the primary cyber characteristics of my network data?” The Python GraphBLAS and PyD4M analysis frameworks enable anonymized statistical analysis to be performed quickly and efficiently on very large network data sets. This approach is tested using billions of anonymized network data samples from the largest Internet observatory (CAIDA Telescope) and tens of millions of anonymized records from the largest commercially available background enrichment capability (GreyNoise). The analysis confirms that most of the enriched variables follow expected heavy-tail distributions and that a large fraction of the network traffic is due to a small number of cyber activities. This information can simplify the cyber analysts' task by enabling prioritization of cyber activities based on statistical prevalence.