학술논문

A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm

Document Type

Periodical

Author

Keller, B.; Venkatesan, R.; Dai, S.; Tell, S.G.; Zimmer, B.; Sakr, C.; Dally, W.J.; Gray, C.T.; Khailany, B.

Source

IEEE Journal of Solid-State Circuits IEEE J. Solid-State Circuits Solid-State Circuits, IEEE Journal of. 58(4):1129-1141 Apr, 2023

Subject

Components, Circuits, Devices and Systems
Engineered Materials, Dielectrics and Plasmas
Computing and Processing
Transformers
Quantization (signal)
Arithmetic
Task analysis
Costs
Deep learning
Optimization
Accuracy-efficiency trade-off
BERT
deep neural network (DNN) inference accelerator
quantization
transformers

Language

ISSN

0018-9200
1558-173X

Abstract

The energy efficiency of deep neural network (DNN) inference can be improved with custom accelerators. DNN inference accelerators often employ specialized hardware techniques to improve energy efficiency, but many of these techniques result in catastrophic accuracy loss on transformer-based DNNs, which have become ubiquitous for natural language processing (NLP) tasks. This article presents a DNN accelerator designed for efficient execution of transformers. The proposed accelerator implements per-vector scaled quantization (VSQ), which employs an independent scale factor for each 64-element vector to enable the use of 4-bit arithmetic with little accuracy loss and low energy overhead. Using a multilevel dataflow to maximize reuse, the 5-nm prototype achieves 95.6 tera-operations per second per Watt (TOPS/W) at 0.46 V on a 4-bit benchmarking layer with VSQ. At a nominal voltage of 0.67 V, the accelerator achieves 1734 inferences/s/W (38.7 TOPS/W) with only 0.7% accuracy loss on BERT-Base and 4714 inferences/s/W (38.6 TOPS/W) with 0.15% accuracy loss on ResNet-50 by using quantization-aware fine-tuning to recover accuracy, demonstrating a practical accelerator design for energy-efficient DNN inference.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송