학술논문

Automated microsoft office macro malware detection using machine learning
Document Type
Conference
Source
2017 IEEE International Conference on Big Data (Big Data) Big Data (Big Data), 2017 IEEE International Conference on. :4448-4452 Dec, 2017
Subject
Aerospace
Bioengineering
Computing and Processing
General Topics for Engineers
Geoscience
Signal Processing and Analysis
Transportation
Malware
Feature extraction
Security
Machine learning algorithms
Testing
Classification algorithms
macro
malware
Microsoft Office
machine learning
p-code
Language
Abstract
Macro malware in Microsoft (MS) Office files has long persisted as a cybersecurity threat. Though it ebbed after its initial rampages around the turn of the century, it has reemerged as threat. Attackers are taking a persuasive approach and using document engineering, aided by improved data mining methods, to make MS Office file malware appear legitimate. Recent attacks have targeted specific corporations with malicious documents containing unusually relevant information. This development undermines the ability of users to distinguish between malicious and legitimate MS Office files and intensifies the need for automating macro malware detection. This study proposes a method of classifying MS Office files containing macros as malicious or benign using the K-Nearest Neighbors machine learning algorithm, feature selection, and TFIDF where p-code opcode n-grams (translated VBA macro code) compose the file features. This study achieves a 96.3% file classification accuracy on a sample set of 40 malicious and 118 benign MS Office files containing macros, and it demonstrates the effectiveness of this approach as a potential defense against macro malware. Finally, it discusses the challenges automated macro malware detection faces and possible solutions.