학술논문

Normalization of Unstructured Indonesian Tweet Text For Presidential Candidates Sentiment Analysis
Document Type
Conference
Source
2019 7th International Conference on Cyber and IT Service Management (CITSM) Cyber and IT Service Management (CITSM), 2019 7th International Conference on. 7:1-6 Nov, 2019
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Dictionaries
Sentiment analysis
Twitter
Standards
Informatics
Filtering
Voting
Sentiment Analysis
opinion mining
Naïve Bayes Classifier
Lexicon
Confusion Matrix
Language
Abstract
Indonesian tweet text has many of unstructured text. This research aims to propose the pre-processing task for cleaning tweets from the abnormal text. The first step, we use the common pre-processing task (case folding, filtering, tokenizing). Second, we use normalization. Each word is found with an excess letter, the word abbreviation, the word coincide and the word slang in each document will be converted into a standard word and also if a word or letter that does not have meaning will be deleted. After the normal text form is formed, stopword removal and stemming are then carried out. The data are taken from 2018 and 2019 data. This study produced the highest accuracy (81%) in 2018 with 425 tweets of training data and 100 tweet testing data and positive sentiment of Prabowo's electability is 52%. This result means that Prabowo deserves to submit himself as a presidential candidate.