학술논문

Self-Supervised Learning Malware Traffic Classification Based on Masked Autoencoder
Document Type
Periodical
Source
IEEE Internet of Things Journal IEEE Internet Things J. Internet of Things Journal, IEEE. 11(10):17330-17340 May, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Task analysis
Feature extraction
Malware
Internet of Things
Training
Self-supervised learning
Cryptography
Malware traffic classification (MTC)
masked autoencoder
self-supervised learning (SSL)
transformer
Language
ISSN
2327-4662
2372-2541
Abstract
Malware traffic classification (MTC) is one of the important techniques to ensure the security of cyberspace, which aims to detect anomalies and classify different types of network traffic. Recently, MTC methods based on deep learning (DL) have shown their excellent performance. However, these DL-based methods rely on data sets with manually labeled samples for training, which are costly and hard to obtain. To address this problem, this article proposes a novel self-supervised MTC method based on the framework of masked auto-encoder (MAE). Specifically, MAE first constructs a reasonable unsupervised pretext task with a random masking strategy, which reduces the redundant information in samples and speeds up the pretraining process. The transformer-based backbone network then efficiently extracts features from the nonredundant traffic data efficiently. The proposed MTC-MAE method employs self-supervised learning on a large-scale unlabeled data set to acquire unbiased features, and fine-tunes on specific data sets to adapt to diverse traffic classification scenarios. Simulation experiments show that our proposed MTC-MAE method is able to learn universal features with high quality and has excellent classification performance on various downstream data sets. The data sets we used, code implementation, and pretrained models are available on GitHub. Code available at https://github.com/TsuiHark/Self-supervised_MTC.