학술논문

Multilabel Aggressive Text Classification from Social Media using Transformer-based Approaches
Document Type
Conference
Source
2023 26th International Conference on Computer and Information Technology (ICCIT) Computer and Information Technology (ICCIT), 2023 26th International Conference on. :1-6 Dec, 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Social networking (online)
Computational modeling
Text categorization
Predictive models
Transformers
Data models
Task analysis
Natural language processing
Aggressive text classification
Deep learning
Text processing
Text corpora
Language
Abstract
The prevalence of multilabel aggressive text content on social media has a detrimental societal impact attracting the attention of government agencies and tech corporations to undertake measures against the spread of it. Hitherto research has focused on high-resource languages like English, leaving low-resource languages like Bengali out of the spotlight. This work presents a transformer-based technique to classify multilabel aggressive texts in Bengali into their targets to aid research in this area. A dataset (EM-BAD) containing 13728 texts is developed into five target classes: Religious Aggression (ReAG), Political Aggression (PoAG), Verbal Aggression (VeAG), Gender Aggression (GeAG), and Racial Aggression (RaAG) to perform the aggressive texts classification. Experimental results demonstrate that the Bangla-BERT with adjusted pooling layer and fine-tuning outdoes all ML, DL, and transformer-base baselines and existing techniques. The Bangla-BERT shows the highest weighted f1-score of 0.89 in the multilabel aggressive text classification task.