학술논문

A 28nm 11.2TOPS/W Hardware-Utilization-Aware Neural-Network Accelerator with Dynamic Dataflow
Document Type
Conference
Source
2023 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2023 IEEE International. :1-3 Feb, 2023
Subject
Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
Deep learning
Convolution
Shape
Neural networks
Parallel processing
Benchmark testing
Energy efficiency
Language
ISSN
2376-8606
Abstract
With the rapid evolution of AI technology, various neural network structures have been developed for diverse applications. As a typical ease, Fig. 22.4.1 shows that the convolution (Conv) layer used in the convolutional neural networks (CNNs) features distinct shapes and types. Neural network accelerators with high peak energy efficiency have been demonstrated [1–4]. However, they usually suffer decreased hardware (mainly multiply-accumulate (MAC) units) utilization for various network structures, which reduces the attainable energy efficiency accordingly. To improve the MAC utilization, the Nvidia deep learning accelerator (NVDLA) [5] applies hardware parallelism along the channel direction, but the MAC utilization is still low for the shallow layers. According to our experiments, NVDLA achieves 23% MAC utilization in the worst case. A Scatter-Gather scheme [4] is utilized to mitigate the utilization drop for shallow layers by rearranging the input features (IF), but the improvement is limited. As depthwise convolution (Dwcv) has been widely used, the accompanying low MAC utilization also needs to be considered. Taking MobileNetV2 as an example, NVDLA only achieves 0.4% utilization for Dwcv. To address these critical issues, this work presents a utilization-aware neural network accelerator, which can dynamically change the level of parallelism along multiple dimensions to maximize the MAC utilization. The chip achieves $> 97.3{\%}$ MAC utilization on benchmark networks while delivering $4.7\times$ higher attainable energy efficiency than state-of-the-art designs [1–4].