학술논문

Leveraging Phone Numbers for Spam detection in Online Social Networks
Document Type
Conference
Source
2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI) Applied Machine Intelligence and Informatics (SAMI), 2021 IEEE 19th World Symposium on. :000119-000124 Jan, 2021
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
Robotics and Control Systems
Signal Processing and Analysis
Social networking (online)
Multimedia Web sites
Production
Manuals
Resource management
Monitoring
Machine intelligence
Online Social Networks
Social Spam
Latent Dirichlet Allocation
Language
Abstract
Online Social Networks (OSNs) are platforms that have gained immense traction from society today. Social media has reshaped our social world and has been playing a pivotal role in sculpting our personal and professional goals. While it provides invaluable information to millions of individuals daily, it has also become one of the most popular places for spam campaigns. In this paper, we design an algorithm for the recognition of spam campaigns, specifically focusing on a phone-numbers based approach. We build a system for spam campaign recognition with an emphasis on phone numbers in the light of the malicious activity that is vandalizing our online experience. This research focuses on data extracted from monitoring the following social networking channels: Tumblr, Twitter, and Flickr. The paper serves as an analytical lens for spam posts accumulated over four months. Regular expressions are used for data cleaning to identify posts containing phone numbers. We collected over 18 million spam posts and filtered the spam-containing posts using regular expressions. Next, we used a Bayesian Model called Latent Dirichlet Allocation (LDA) to perform a statistical model for detecting the category of the posts. We further use the bag-of-words and the tf-idf means to this data and apply cosine similarity for the similarity measure.