학술논문

DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining

Document Type

Conference

Author

Zhang, Lin; Shi, Shaohuai; Chu, Xiaowen; Wang, Wei; Li, Bo; Liu, Chengjian

Source

2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS) ICDCS Distributed Computing Systems (ICDCS), 2023 IEEE 43rd International Conference on. :142-153 Jul, 2023

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
Training
Backpropagation
Deep learning
Tensors
Scheduling algorithms
Computational modeling
Ethernet

Language

ISSN

2575-8411

Abstract

Communication scheduling has been shown to be effective in accelerating distributed training, which enables all-reduce communications to be overlapped with backpropagation computations. This has been commonly adopted in popular distributed deep learning frameworks. However, there exist two fundamental problems: (1) excessive startup latency proportional to the number of workers for each all-reduce operation; (2) it only achieves sub-optimal training performance due to the dependency and synchronization requirement of the feed-forward computation in the next iteration. We propose a novel scheduling algorithm, DeAR, that decouples the all-reduce primitive into two continuous operations, which overlaps with both backpropagation and feed-forward computations without extra communications. We further design a practical tensor fusion algorithm to improve the training performance. Experimental results with five popular models show that DeAR achieves up to 83% and 15% training speedup over the state-of-the-art solutions on a 64-GPU cluster with 10Gb/s Ethernet and 100Gb/s InfiniBand interconnects, respectively.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송