학술논문

문서 군집화를 위한 워드 임베딩, PCA와 K-평균 군집의 결합 / Association of Word Embeddings, PCA and K-means for text clustering

Document Type

Dissertation/ Thesis

Author

김동현 / Kim, Donghyun

Source

Subject

문서 군집화
워드 임베딩
차원 축소
K-평균 군집

Language

Korean

Abstract

This study was conducted to address the challenge of users finding desired information becoming increasingly difficult due to the surge in available documents across various topics in the modern web environment. Document clustering is a powerful tool that can enhance the accessibility and usability of information. By grouping documents with similar features, document clustering can help users find the information they need quickly and easily. In this study, PCA-KM was proposed for document clustering. PCA-KM involves reducing the dimensions of document vectors obtained through word embedding using PCA(Principal Component Analysis), followed by combining with modified K-means clustering. In terms of clustering performance metrics, we compared the traditional method of directly combining word embedding and K-means with the proposed method. As a result, the proposed method yielded comparable or superior performance in document clustering. So, the proposed method will be expected to contribute to the advancement of a more efficient document search service.

Online Access

Full Text (dCollection)

이메일

부산대학교 도서관

Online Access

메일 발송