학술논문

Android Authorship Attribution Using Source Code-Based Features
Document Type
Periodical
Author
Source
IEEE Access Access, IEEE. 12:6569-6589 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Operating systems
Feature extraction
Source coding
Libraries
Malware
Metadata
Object recognition
Androids
Android
authorship attribution
mobile malware
metadata
obfuscation
source code-based
Language
ISSN
2169-3536
Abstract
With the widespread use of mobile devices, Android has become the most popular operating system, and new applications being uploaded to the Android market every day. However, due to the ease of modifying and repackaging Android binaries, Android applications can easily be modified and imitated by other developers and released in third-party Android markets. Therefore, determining the original developers of Android applications is a challenging problem known as authorship attribution. This study explores the distinctive features of Android applications to identify their authors. Software developers generally leave a footprint that reflects their writing styles in their applications. Therefore, this footprint, which can be extracted from either the source code or the binary code, can help identify the authors of software applications. Since obtaining the source code of applications in the wild can be impractical, especially when dealing with malware, researchers prefer to focus on the binaries of applications. Therefore, this study proposes an approach that identifies Android developers by deriving a wide range of features from different parts of Android applications, such as smali files, libraries, manifest files, and metadata information. Moreover, other features such as configuration, dex code, resource-based, and string-related features are inherited from other studies in Android authorship attribution and fused with the proposed feature set. The proposed approach was evaluated on benign and malware datasets and compared with those of other studies. The results show that the proposed features increase the accuracy by showing 82.5% and 95.6% in the market and malware datasets, respectively. The results demonstrate the positive impact of the proposed features on Android authorship attribution.