학술논문

Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial
Document Type
Periodical
Source
IEEE Transactions on Circuits and Systems II: Express Briefs IEEE Trans. Circuits Syst. II Circuits and Systems II: Express Briefs, IEEE Transactions on. 71(3):1708-1714 Mar, 2024
Subject
Components, Circuits, Devices and Systems
Computational modeling
Tensors
Computational efficiency
Transformers
Encoding
Data models
Convolution
Hardware acceleration
sparsity
CNN
transformer
tutorial
deep learning
Language
ISSN
1549-7747
1558-3791
Abstract
Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning technique is developed to compress the computational and memory-intensive DNNs. However, directly executing these sparse models on a common hardware accelerator can cause significant under-utilization, since invalid data resulting from the sparse patterns leads to unnecessary computations and irregular memory accesses. This brief analyzes the critical issues in accelerating sparse models, and provides an overview of typical hardware designs for various sparse DNNs, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformers. Following the overview, we give a practical guideline of designing efficient accelerators for sparse DNNs with qualitative metrics to evaluate hardware overhead under different cases. In addition, we highlight potential opportunities in terms of hardware/software/algorithm co-optimizations from the perspective of sparse DNN implementation, and provide insights into recent design trends for the efficient implementation of transformers with sparse attention, which facilitates large language model (LLM) deployments with high throughput and energy efficiency.