학술논문

A Complete Review on the Application of Statistical Methods for Evaluating Internet Traffic Usage
Document Type
Periodical
Source
IEEE Access Access, IEEE. 10:128433-128455 2022
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Internet
Statistical analysis
Protocols
Support vector machines
Correlation
Real-time systems
Quality of service
Encrypted internet traffic
traffic classification
statistical distances
statistical divergences
statistical methods
support vector machines
Language
ISSN
2169-3536
Abstract
Internet traffic classification aims to identify the kind of Internet traffic. With the rise of traffic encryption and multi-layer data encapsulation, some classic classification methods have lost their strength. In an attempt to increase classification performance, Machine Learning (ML) strategies have gained the scientific community interest and have shown themselves promising in the future of traffic classification, mainly in the recognition of encrypted traffic. However, some of these methods have a high computational resource consumption, which make them unfeasible for classification of large traffic flows or in real-time. Methods using statistical analysis have been used to classify real-time traffic or large traffic flows, where the main objective is to find statistical differences among flows or find a pattern in traffic characteristics through statistical properties that allow traffic classification. The purpose of this work is to address statistical methods to classify Internet traffic that were little or unexplored in the literature. This work is not generally focused on discussing statistical methodology. It focuses on discussing statistical tools applied to Internet traffic classification Thus, we provide an overview on statistical distances and divergences previously used or with potential to be used in the classification of Internet traffic. Then, we review previous works about Internet traffic classification using statistical methods, namely Euclidean, Bhattacharyya, and Hellinger distances, Jensen-Shannon and Kullback–Leibler (KL) divergences, Support Vector Machines (SVM), Correlation Information (Pearson Correlation), Kolmogorov-Smirnov and Chi-Square tests, and Entropy. We also discuss some open issues and future research directions on Internet traffic classification using statistical methods.