학술논문

Improving K Nearest Neighbor into String Vector Version for Text Categorization
Document Type
Conference
Author
Source
2019 21st International Conference on Advanced Communication Technology (ICACT) Advanced Communication Technology (ICACT), 2019 21st International Conference on. :1091-1097 Feb, 2019
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
String Vector
K Nearest Neighbor
Text Categorization
Language
ISSN
1738-9445
Abstract
This research is concerned with the string vector based version of the KNN which is the approach to the text categorization. Traditionally, texts have been encoded into numerical vectors for using the traditional version of KNN, and encoding so leads to the three main problems: huge dimensionality, sparse distribution, and poor transparency. In order to solve the problems, this research propose that texts should be encoded into string vectors the similarity measure between string vectors is defined, and the KNN is modified into the version where string vector is given its input. The proposed KNN version is validated empirically by comparing it with the traditional KNN version on the three collections: NewsPage.com, Opiniopsis, and 20NewsGroups. The goal of this research is to improve the text categorization performance by solving them.