학술논문
Noisy Label Detection for Multi-labeled Malware
Document Type
Conference
Source
2024 IEEE 21st Consumer Communications & Networking Conference (CCNC) Consumer Communications & Networking Conference (CCNC), 2024 IEEE 21st. :165-171 Jan, 2024
Subject
Language
ISSN
2331-9860
Abstract
Malware attacks have become increasingly prevalent, and accurate and reliable malware detection is essential for combating them. However, mislabeling, where data is given a different/noisy label than its true label, can significantly affect the accuracy and reliability of malware detection. In this paper, we propose a new method for detecting noisy labels in multi-labeled malware datasets. Our approach involves a new transformation method that allows malware datasets with multiple labels to be treated as data with a single label without losing any essential information. We also introduce a new machine learning model for detecting mislabeling, based on this transformation method. We conducted experiments on a real-world malware dataset to evaluate the effectiveness of our proposed method, and our findings indicate that our method can detect mislabels with high accuracy, up to 94.7%. Our research aims to improve the quality of labeling and reduce the factors contributing to mislabeling in malware datasets, leading to more accurate and reliable malware detection.