학술논문

Data Transfer for Balancing Model Convergence and Training Time in Federated Learning
Document Type
Conference
Source
GLOBECOM 2023 - 2023 IEEE Global Communications Conference Global Communications Conference, GLOBECOM 2023 - 2023 IEEE. :6777-6782 Dec, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Engineering Profession
General Topics for Engineers
Power, Energy and Industry Applications
Signal Processing and Analysis
Training
Federated learning
Data transfer
Data models
Servers
Optimization
Convergence
federated learning
conditional optimization problem
routing optimization
Language
ISSN
2576-6813
Abstract
Federated learning is a distributed machine learning technique that addresses the challenges of traditional centralized machine learning, such as high computational expenses and network congestion. In federated learning, a model is distributed to each server from a central server, and then each server trains the model using its own data. After training, the models are integrated at the central server, and the integrated model is re-distributed to the servers for further training. Federated learning can train a model based on all the data without collecting the data in one place, reducing the burden on the network and addressing the problem of training costs for extensive amounts of data. However, federated learning has two unique challenges: prolonged training time depending on the amount of data in each server, the processing capacity of each server, and the bandwidth of links in a network, and the accuracy of the trained model, which is influenced by the heterogeneity of the datasets in each server. We propose an optimization problem for the training time and the convergence behavior in federated learning. Specifically, we present a nonlinear programming problem to minimize the total training time while optimizing the data transfer route and destination of each router, subject to constraints on network congestion avoidance and convergence of the trained model. We evaluate the proposed method in an experiment using a virtual GPU cluster and show that the proposed method improves both the accuracy of the trained models and the training time compared to a prior method.