학술논문

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs
Document Type
Periodical
Source
IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 32(10):2448-2463 Oct, 2021
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Training
Monitoring
Sparse matrices
Graphics processing units
Acceleration
Market research
Engines
DNN
sparsity
GPU
Language
ISSN
1045-9219
1558-2183
2161-9883
Abstract
Deep Neural Networks (DNNs) have emerged as an important class of machine learning algorithms, providing accurate solutions to a broad range of applications. Sparsity in activation maps in DNN training presents an opportunity to reduce computations. However, exploiting activation sparsity presents two major challenges: i) profiling activation sparsity during training comes with significant overhead due to computing the degree of sparsity and the data movement; ii) the dynamic nature of activation maps requires dynamic dense-to-sparse conversion during training, leading to significant overhead. In this article, we present Spartan , a lightweight hardware/software framework to accelerate DNN training on a GPU. Spartan provides a cost-effective and programmer-transparent microarchitectural solution to exploit activation sparsity detected during training. Spartan provides an efficient sparsity monitor, a tile-based sparse GEMM algorithm, and a novel compaction engine designed for GPU workloads. Spartan can reduce sparsity profiling overhead by 52.5× on average. For the most compute-intensive layers, i.e., convolutional layers, we can speedup AlexNet by 3.4×, VGGNet-16 by 2.14×, and ResNet-18 by 2.02×, when training on the ImageNet dataset.