학술논문

A Hierarchical Communication Algorithm for Distributed Deep Learning Training

Document Type

Conference

Author

Zhang, Jiayu; Cheng, Shaojun; Dong, Feng; Chen, Ke; Qiao, Yong; Mao, Zhigang; Jiang, Jianfei

Source

2023 IEEE 66th International Midwest Symposium on Circuits and Systems (MWSCAS) Circuits and Systems (MWSCAS), 2023 IEEE 66th International Midwest Symposium on. :526-530 Aug, 2023

Subject

Components, Circuits, Devices and Systems
Deep learning
Training
Performance evaluation
Quantization (signal)
Clustering algorithms
Bandwidth
Optimization
Deep Learning
Distributed Training
Computer Network

Language

ISSN

1558-3899

Abstract

Distributed deep learning training nowadays has become an important workload on data center GPU clusters. However, in some cases, the inter-node bandwidth is limited (e.g., 20Gbps) and thus becomes a performance bottleneck for existing deep learning systems to scale deep learning training across multiple nodes. To exploit this insight, we propose a hierarchical communication algorithm combined with Asynchronous SGD and Synchronous SGD named AS-SGD to make full use of both inter-node and intra-node network bandwidth. Moreover, a set of system optimization techniques like quantization and decentralization are applied to further reduce communication costs. Finally, we present a performance evaluation of our algorithm on a 4-node cluster (each node with 8 Nvidia Tesla V100 GPUs). Experiments show that our algorithm achieves up to 4.95X speedup than existing state-of-the-art systems on popular deep learning models and datasets.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송