학술논문

Noisy Label Detection for Multi-labeled Malware
Document Type
Conference
Source
2024 IEEE 21st Consumer Communications & Networking Conference (CCNC) Consumer Communications & Networking Conference (CCNC), 2024 IEEE 21st. :165-171 Jan, 2024
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Machine learning
Predictive models
Malware
Reliability
Noise measurement
Security
Labeling
noisy label
malware dataset
machine learning
Language
ISSN
2331-9860
Abstract
Malware attacks have become increasingly prevalent, and accurate and reliable malware detection is essential for combating them. However, mislabeling, where data is given a different/noisy label than its true label, can significantly affect the accuracy and reliability of malware detection. In this paper, we propose a new method for detecting noisy labels in multi-labeled malware datasets. Our approach involves a new transformation method that allows malware datasets with multiple labels to be treated as data with a single label without losing any essential information. We also introduce a new machine learning model for detecting mislabeling, based on this transformation method. We conducted experiments on a real-world malware dataset to evaluate the effectiveness of our proposed method, and our findings indicate that our method can detect mislabels with high accuracy, up to 94.7%. Our research aims to improve the quality of labeling and reduce the factors contributing to mislabeling in malware datasets, leading to more accurate and reliable malware detection.