학술논문

Comparative Performance of Machine Learning Methods for Text Classification
Document Type
Conference
Source
2020 International Conference on Computing and Information Technology (ICCIT-1441) Computing and Information Technology (ICCIT-1441), 2020 International Conference on. :1-5 Sep, 2020
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
Robotics and Control Systems
Signal Processing and Analysis
Matrices
Ciphers
Additives
Matrix converters
Computer science
Nickel
Natural language processing
Text mining
Data mining
machine learning
deep learning+
Language
Abstract
Machine learning methods, including Deep learnings are popular for data processing. Deep learning methods have shown great promise in applied in natural language processing (NLP) tasks. Text classification is a supervised machine learning task that involves labelled documents to train classifier. Previous works involving machine learning and deep learning methods for text classification have been tested with relatively small- sized data instances. In this paper, we compared the performance of the machine leaning and deep learning algorithms in text classification task. This paper also studied, explored and compared the scalability of these methods with respect to bigger data instances. We used support vector machines (SVM), Logistic regression, Random forest and Naïve Bayes Machine leaning algorithms and convolutional neural network (CNN) deep learning method. The task involved a multi-class classification problem involving six (6) classes consisting of six thousand (6,000) data instances with an average of 20 sentences in each data instance. The CNN deep leaning algorithm outperformed all the machine learning algorithms, achieving an accuracy of over 85%. This is because the filter weights are leaning are updated in backward propagation in each epoch, hence, this result in better result compared to the traditional methods.