학술논문

Machine Learning & Concept Drift based Approach for Malicious Website Detection

Document Type

Conference

Author

Singhal, Siddharth; Chawla, Utkarsh; Shorey, Rajeev

Source

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS) COMmunication Systems & NETworkS (COMSNETS), 2020 International Conference on. :582-585 Jan, 2020

Subject

Communication, Networking and Broadcast Technologies
Feature extraction
Uniform resource locators
Forestry
Supervised learning
Malware
Training data
Machine learning
URL Feature Extraction
Malicious Website Detection
Concept Drifts
Feature Vectors
Gradient Boosted Trees
Random Forest
Feedforward Neural Networks

Language

ISSN

2155-2509

Abstract

The rampant increase in the number of available cyber attack vectors and the frequency of cyber attacks necessitates the implementation of robust cybersecurity systems. Malicious websites are a significant threat to cybersecurity. Miscreants and hackers use malicious websites for illegal activities such as disrupting the functioning of the systems by implanting malware, gaining unauthorized access to systems, or illegally collecting personal information. We propose and implement an approach for classifying malicious and benign websites given their Uniform Resource Locator (URL) as input. Using the URL provided by the user, we collect Lexical, Host-Based, and Content-Based features for the website. These features are fed into a supervised Machine Learning algorithm as input that classifies the URL as malicious or benign. The models are trained on a dataset consisting of multiple malicious and benign URLs. We have evaluated the accuracy of classification for Random forests, Gradient Boosted Decision Trees and Deep Neural Network classifiers. One loophole in the use of Machine learning for detection is the availability of the same training data to the attackers. This data is exploited by the miscreants to alter the features associated with the Malicious URLs, which will be classified as benign by the supervised learning algorithms. Further, owing to the dynamic nature of the malicious websites, we also propose a paradigm for detecting and countering these manually induced concept drifts.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송