학술논문

Assessing Accuracy: A Study of Lexicon and Rule-Based Packages in R and Python for Sentiment Analysis
Document Type
Periodical
Source
IEEE Access Access, IEEE. 12:20169-20180 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Sentiment analysis
Python
Libraries
Dictionaries
Machine learning
Time complexity
Support vector machines
lexicon and rule based
sentiment analysis by R
sentiment analysis by python
VADER
sentimentr
Language
ISSN
2169-3536
Abstract
Sentiment analysis has become a focal point of interdisciplinary research, prompting the use of diverse methodologies and the continual emergence of programming language packages. Notably, Python and R have introduced comprehensive packages in this realm. In this study, we analyze established packages in these languages, focusing on accuracy while also considering time complexity. Across experiments conducted on seven distinct datasets, a crucial revelation surfaces: the accuracy of these packages significantly varies depending on the dataset used. Among these, the ‘sentimentr’ package consistently performs well across diverse datasets. Generally, Python libraries showcase superior processing speed. However, it’s essential to note that while these packages adeptly classify sentences as positive or negative, capturing sentiment intensity proves challenging. Our findings highlight a prevalent trend of overfitting, where these packages excel on familiar datasets but struggle when faced with unfamiliar ones.