학술논문

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism

Document Type

Periodical

Author

Oyama, Y.; Maruyama, N.; Dryden, N.; McCarthy, E.; Harrington, P.; Balewski, J.; Matsuoka, S.; Nugent, P.; Van Essen, B.

Source

IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 32(7):1641-1652 Jul, 2021

Subject

Computing and Processing
Communication, Networking and Broadcast Technologies
Training
Three-dimensional displays
Computational modeling
Parallel processing
Solid modeling
Memory management
Image segmentation
Deep learning
convolutional neural network
model-parallel training
hybrid-parallel training

Language

ISSN

1045-9219
1558-2183
2161-9883

Abstract

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up to 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송