학술논문

Xtractor: A Two-Step Tweet Extractor for Sentiment Analysis
Document Type
Conference
Source
2024 5th International Conference on Advancements in Computational Sciences (ICACS) Advancements in Computational Sciences (ICACS), 2024 5th International Conference on. :1-6 Feb, 2024
Subject
Computing and Processing
Robotics and Control Systems
Sentiment analysis
Matched filters
Dictionaries
Social networking (online)
Blogs
Decision making
Metadata
python
sentiment analysis
social media network
twitter data extractor
X platform
Language
Abstract
Twitter (now X) has been gaining popularity with each passing day since its inception in 2006. People have been using Twitter as an instant repository to collect data and gain insight into folks’ minds on trending issues. Although Twitter allows access to its data through streaming and rest APIs, extracting the required data is difficult. The data (tweet) returned by Twitter is in .json format, having at least 26 fields. Each field is bundled in the dictionary form data structure. tweet metadata such as tweet “likes” and “retweets” increase the volume and complexity of a tweet, making data extraction in a cleaner format more difficult. This work aims to develop an effective two-step tweet extractor (Xtractor: an online-offline keyword-based Twitter data extractor). In the first step, Xtractor collects publicly available topical tweets using keywords, hashtags, or their list on the fly, parses all the tweet fields, and filters them to retain potential field contents without additional payload. The second step of the Xtractor evicts partially matching tweets using the regular expression method to acquire the targeted domain-specific tweets. Using the proposed two-step Twitter data extraction method, datasets concerning the people of Pakistan have been collected, which can be leveraged to get insight for decision-making, specially in the context of sentiment analysis. We found this method efficient, capacitive, and productive.