학술논문

Fast Performance Prediction for Efficient Distributed DNN Training
Document Type
Periodical
Author
Source
IEEE Computer Architecture Letters IEEE Comput. Arch. Lett. Computer Architecture Letters. 22(2):133-136 Dec, 2023
Subject
Computing and Processing
Optimization
Costs
Performance evaluation
Parallel processing
Training
Throughput
Tensors
Distributed training
performance modeling
large language model
3D parallelism
Language
ISSN
1556-6056
1556-6064
2473-2575
Abstract
Training large-scale DNN models requires parallel distributed training using hyper-scale systems. To make the best use of the numerous accelerators, it is essential to intelligently combine different parallelization schemes. However, as the size of DNN models increases, the possible combinations of schemes become enormous, and consequently, finding the optimal parallel plan becomes exceedingly expensive and practically unfeasible. In this letter, we introduce a novel cost model, the Markovian Performance Estimator (MPE). This model provides affordable estimates of the throughput of various parallel plans, promoting efficient and fast searches for the ideal parallel plan, even when resources are limited. Significantly, this work is pioneering in explaining the expensive nature of searching for an optimal plan and addressing it using intuitive performance estimations based on real device evaluations. Our experiments demonstrate the effectiveness of the MPE, revealing that it accelerates the optimization process up to 126x faster (36.4 on average) than the existing state-of-the-art baseline, Alpa.