학술논문

Text categorization using hybrid (mined) terms (poster session)
Document Type
Conference
Source
Proceedings of the fifth international workshop on on Information retrieval with Asian languages. :217-218
Subject
data mining
evaluation
text categorization
Language
English
Abstract
This paper evaluated text categorization using charactes, bigrams, words and hybrid terms. These terms were also augmented with mined terms. Classifiers using hybrid terms did not achieve better classification performance. The use of data mining techniques to add new terms to the dictionary improves the performance of character-based classifiers. Our nave comparison between the Pat-tree classifier and our best classifier shows that the Pat-tree classifier has the best precision (77%) and our best classifier has the best recall (72%) and the lowest storage requirement (13%).

Online Access