학술논문

문서 군집화를 위한 워드 임베딩, PCA와 K-평균 군집의 결합 / Association of Word Embeddings, PCA and K-means for text clustering
Document Type
Dissertation/ Thesis
Source
Subject
문서 군집화
워드 임베딩
차원 축소
K-평균 군집
Language
Korean
Abstract
This study was conducted to address the challenge of users finding desired information becoming increasingly difficult due to the surge in available documents across various topics in the modern web environment. Document clustering is a powerful tool that can enhance the accessibility and usability of information. By grouping documents with similar features, document clustering can help users find the information they need quickly and easily. In this study, PCA-KM was proposed for document clustering. PCA-KM involves reducing the dimensions of document vectors obtained through word embedding using PCA(Principal Component Analysis), followed by combining with modified K-means clustering. In terms of clustering performance metrics, we compared the traditional method of directly combining word embedding and K-means with the proposed method. As a result, the proposed method yielded comparable or superior performance in document clustering. So, the proposed method will be expected to contribute to the advancement of a more efficient document search service.