
Xtractor: A Two-Step Tweet Extractor for Sentiment Analysis
2024 5th International Conference on Advancements in Computational Sciences (ICACS) Advancements in Computational Sciences (ICACS), 2024 5th International Conference on. :1-6 Feb, 2024
Twitter (now X) has been gaining popularity with each passing day since its inception in 2006. People have been using Twitter as an instant repository to collect data and gain insight into folks’ minds on trending issues. Although Twitter allows access to its data through streaming and rest APIs, extracting the required data is difficult. The data (tweet) returned by Twitter is in .json format, having at least 26 fields. Each field is bundled in the dictionary form data structure. tweet metadata such as tweet “likes” and “retweets” increase the volume and complexity of a tweet, making data extraction in a cleaner format more difficult. This work aims to develop an effective two-step tweet extractor (Xtractor: an online-offline keyword-based Twitter data extractor). In the first step, Xtractor collects publicly available topical tweets using keywords, hashtags, or their list on the fly, parses all the tweet fields, and filters them to retain potential field contents without additional payload. The second step of the Xtractor evicts partially matching tweets using the regular expression method to acquire the targeted domain-specific tweets. Using the proposed two-step Twitter data extraction method, datasets concerning the people of Pakistan have been collected, which can be leveraged to get insight for decision-making, specially in the context of sentiment analysis. We found this method efficient, capacitive, and productive.