학술논문

Iterative clustering of high dimensional text data augmented by local search

Document Type

Conference

Author

Source

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Data mining Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. :131-138 2002

Subject

Computing and Processing
Clustering algorithms
Iterative algorithms
Data mining
Frequency
Mathematics
Statistics
Euclidean distance
Information retrieval
Refining

Language

Abstract

The k-means algorithm with cosine similarity, also known as the spherical k-means algorithm, is a popular method for clustering document collections. However spherical k-means can often yield qualitatively poor results, especially when cluster sizes are small, say 25-30 documents per cluster, where it tends to get stuck at a local maximum far away from the optimal solution. In this paper, we present a local search procedure, which we call 'first-variation" that refines a given clustering by incrementally moving data points between clusters, thus achieving a higher objective function value. An enhancement of first variation allows a chain of such moves in a Kernighan-Lin fashion and leads to a better local maximum. Combining the enhanced first-variation with spherical k-means yields a powerful "ping-pong" strategy that often qualitatively improves k-means clustering and is computationally efficient. We present several experimental results to highlight the improvement achieved by our proposed algorithm in clustering high-dimensional and sparse text data.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송