학술논문

OCTOPUS: Overcoming Performance and Privatization Bottlenecks in Distributed Learning
Document Type
Periodical
Source
IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 33(12):3460-3477 Dec, 2022
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Distributed databases
Task analysis
Servers
Privatization
Data models
Dictionaries
Training
Distributed learning
data collection
representation learning
disentanglement
privatization
Language
ISSN
1045-9219
1558-2183
2161-9883
Abstract
The diversity and quantity of data warehouses, gathering data from distributed devices such as mobile devices, can enhance the success and robustness of machine learning algorithms. Federated learning enables distributed participants to collaboratively learn a commonly shared model while holding data locally. However, it is also faced with expensive communication and limitations due to the heterogeneity of distributed data sources and lack of access to global data. In this paper, we investigate a practical distributed learning scenario where multiple downstream tasks (e.g., classifiers) could be efficiently learned from dynamically updated and non-iid distributed data sources while providing local data privatization. We introduce a new distributed/collaborative learning scheme to address communication overhead via latent compression, leveraging global data while providing privatization of local data without additional cost due to encryption or perturbation. This scheme divides learning into (1) informative feature encoding, and transmitting the latent representation of local data to address communication overhead; (2) downstream tasks centralized at the server using the encoded codes gathered from each node to address computing overhead. Besides, a disentanglement strategy is applied to address the privatization of sensitive components of local data. Extensive experiments are conducted on image and speech datasets. The results demonstrate that downstream tasks with the compact latent representations with the privatization of local data can achieve comparable accuracy to centralized learning.