학술논문

A Hierarchical Communication Algorithm for Distributed Deep Learning Training
Document Type
Conference
Source
2023 IEEE 66th International Midwest Symposium on Circuits and Systems (MWSCAS) Circuits and Systems (MWSCAS), 2023 IEEE 66th International Midwest Symposium on. :526-530 Aug, 2023
Subject
Components, Circuits, Devices and Systems
Deep learning
Training
Performance evaluation
Quantization (signal)
Clustering algorithms
Bandwidth
Optimization
Deep Learning
Distributed Training
Computer Network
Language
ISSN
1558-3899
Abstract
Distributed deep learning training nowadays has become an important workload on data center GPU clusters. However, in some cases, the inter-node bandwidth is limited (e.g., 20Gbps) and thus becomes a performance bottleneck for existing deep learning systems to scale deep learning training across multiple nodes. To exploit this insight, we propose a hierarchical communication algorithm combined with Asynchronous SGD and Synchronous SGD named AS-SGD to make full use of both inter-node and intra-node network bandwidth. Moreover, a set of system optimization techniques like quantization and decentralization are applied to further reduce communication costs. Finally, we present a performance evaluation of our algorithm on a 4-node cluster (each node with 8 Nvidia Tesla V100 GPUs). Experiments show that our algorithm achieves up to 4.95X speedup than existing state-of-the-art systems on popular deep learning models and datasets.