학술논문

Overview of the Mowjaz Multi-Topic Labelling Task
Document Type
Conference
Source
2021 12th International Conference on Information and Communication Systems (ICICS) Information and Communication Systems (ICICS), 2021 12th International Conference on. :502-508 May, 2021
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Support vector machines
Recurrent neural networks
Atmospheric measurements
Error analysis
Text categorization
Particle measurements
Natural language processing
Multi-label Text Classification
SVM
RNN
LSTM
GRU
AraVec
Arabic BERT
AraBERT
GigaBERT
Language
ISSN
2573-3346
Abstract
Multilabel text classification is an important task in Natural Language Processing (NLP). One use case of such a task is in categorizing news articles, where each article may belong to one or more classes. In this work, we present the ICICS2021 Mowjaz Multi-Topic Labelling Task. Given a piece of news, systems participating in this task are expected to select its topic(s). The systems are evaluated based on the F1 score measure. In total, 46 teams registered on the task’s CodaLab page. Out of them, 28 teams submitted 309 runs. The results are surprisingly high. Moreover, they are very close to each other with all teams having systems achieving F1 scores ranging between 0.7965 and 0.8567. Most of these systems used deep learning models, such as Recurrent Neural Networks (RNN), coupled with pretrained word embeddings such as BERT-based models. Few of them experimented with traditional machine learning models such as Support Vector Machine (SVM) and Naive Bayes (NB).