학술논문

Android malware detection system integrating block feature extraction and multi-head attention mechanism
Document Type
Conference
Source
2020 International Computer Symposium (ICS) ICS Computer Symposium (ICS), 2020 International. :408-413 Dec, 2020
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Deep learning
Recurrent neural networks
Semantics
Feature extraction
Malware
Timing
Task analysis
Deep Learning
Android
multi-head attention
Transformer
LSTM
Static Analysis
Language
Abstract
With the rapid development of deep learning technology, the task of detecting mobile malware has made breakthrough progress. However, the deep learning model based on time series, when inputting long sequence features, still has the problem of gradient vanish due to the memory limitation of the recurrent neural network. Therefore, many subsequent studies have proposed feature compression and extraction methods for long sequence features, but no research has been found that can compress the sequence while still covering the complete feature information of the original sequence and the semantic temporal relationship. Therefore, this paper proposes a multi-model malware detection architecture that focuses on covering the global features while still maintaining partial timing relationships between compressed features. After integrating the Multi-head Attention mechanism, the recurrent neural network memory problem is improved. The model is executed in two stages: the pre-processing stage, which mainly performs segmentation and statistics for Dalvik Opcode; In the detection stage, input Bi-LSTM for semantic extraction. This stage helps to compress the original Opcode sequence to generate rich timing semantic block sequence of the meaning is used as the classification feature of the downstream classifier. The classifier in this study improves the Transformer model. The Multi-head Attention mechanism is used to efficiently focus on the sequence features, and the Global Pooling Layer is subsequently added to strengthen the model’s sensitivity to data. Dimensionality reduction is performed to reduce overfitting of the model. Experimental results show that the accuracy reaches 99.63%, which is better than the deep learning method using images, and effectively reduces the vanishing gradient problem.