학술논문

Greedy Centroid Initialization for Federated K-means
Document Type
Conference
Source
2023 57th Annual Conference on Information Sciences and Systems (CISS) Information Sciences and Systems (CISS), 2023 57th Annual Conference on. :1-6 Mar, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Photonics and Electrooptics
Robotics and Control Systems
Signal Processing and Analysis
Greedy algorithms
Clustering algorithms
Partitioning algorithms
Servers
K-means
Clustering
Federated Learning
Machine Learning
Language
Abstract
K-means is a widely used data clustering algorithm which aims to partition a set of data points into $K$ clusters through finding the best $K$ centroids representing the data points. Initialization plays a vital role in the traditional centralized K-means clustering algorithm where the clustering is carried out at a central node accessing the entire data points. In this paper, we focus on K-means in a federated setting, where the clients store data locally, and the raw data never leaves the devices. Given the importance of initialization on the federated K-means algorithm, we aim to find better initial centroids by leveraging the local data on each client. To this end, we start the centroid initialization at the clients rather than at the server, which has no information about the clients' data initially. The clients first select their local initial clusters, and they share their clustering information (cluster centroids and sizes) with the server. The server then uses a greedy algorithm to choose the global initial centroids based on the information received from the clients. Numerical results on synthetic and public datasets show that our proposed method can achieve better and more stable performance than three federated K-means variants, and similar performance to the centralized K-means algorithm.