학술논문

Efficient k-means++ with random projection
Document Type
Conference
Source
2017 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2017 International Joint Conference on. :94-100 May, 2017
Subject
Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Signal Processing and Analysis
Language
ISSN
2161-4407
Abstract
We propose three algorithms as approximations to k-means++ with complexity only O(nk) which is independent of D with n being the number of data points, k being the number of clusters and D the dimensionality. By combining random projection and k-means++, the proposed algorithm still enjoying the lower-bounded performance by results given for k-means++, on the other hand, the errors introduced by the approximation are also bounded by JL lemma, which is sufficently low in the experiment we presented. By approximating D(x) 2 in the k-means++ algorithm, we obtain an approximate algorithm to k-means++ of complexity only O(nk). The approximation to D(x) 2 can be obtained through random projections by only perturbing the distances to a provably small extent. Our experiments on real-world datasets show that, when k is small, the performance of minimization of the potential deteriorated only by 3%–5% while the execution time can be shortened to 50% of the original. With a large k, compared to the results of our proposed algorithms with small values of k, the error introduced by random projection is much reduced and the speed up of the execution time is improved. In our experiments, it is shown that the speed up of the proposed algorithms is up to 20 times compared to k-means++.