학술논문

Leakage Reuse for Energy Efficient Near-Memory Computing of Heterogeneous DNN Accelerators
Document Type
Periodical
Source
IEEE Journal on Emerging and Selected Topics in Circuits and Systems IEEE J. Emerg. Sel. Topics Circuits Syst. Emerging and Selected Topics in Circuits and Systems, IEEE Journal on. 11(4):762-775 Dec, 2021
Subject
Components, Circuits, Devices and Systems
System-on-chip
Random access memory
Memory management
Computer architecture
Leakage currents
Computational modeling
Receivers
Leakage reuse
near-memory computing
on-chip SRAM
heterogeneous DNN accelerator
ubiquitous DNN applications
edge devices
Language
ISSN
2156-3357
2156-3365
Abstract
The exploration of custom deep neural network (DNN) accelerators for highly energy constrained edge devices with on-device intelligence is gaining traction in the research community. Despite the superior throughput and performance of custom accelerators as compared to CPUs or GPUs, the energy efficiency and versatility of state-of-the-art DNN accelerators is constrained due to a) the storage and movement of a large volume of data and b) the limited scope of monolithic architectures, where the entire accelerator executes only a single model at any given time. In this paper, a multi-voltage domain heterogeneous DNN accelerator is proposed that executes multiple models simultaneously with different power-performance operating points. The proposed architecture concurrently implements near-memory computing and leakage reuse, where the leakage current of idle memory banks within each processing element is utilized to deliver current to the adjacently placed multiply-and-accumulate (MAC) units. The proposed architecture and circuit techniques are evaluated with SPICE simulation in a 65 nm CMOS technology. The simulation results indicate that the proposed heterogeneous architecture with leakage reuse results in an energy efficiency of 3.27 tera-operations per second per watt (TOPS/W) as compared to a conventional monolithic and single voltage domain architecture that exhibits an energy efficiency of 0.0458 TOPS/W. In addition, the proposed accelerator that implements the leakage reuse technique on only half of the memory elements storing the weights reduces the power consumption of the sub-arrays of processing elements by 26% (99.4 mW) as compared to an accelerator that does not apply leakage reuse.