학술논문

NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators With 3-D Stacked-DRAM
Document Type
Periodical
Source
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on. 43(5):1456-1469 May, 2024
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Artificial neural networks
Random access memory
Hardware
Computer architecture
Convolution
Space exploration
Parallel processing
Deep neural network (DNN) accelerator
design space exploration (DSE)
processing-in-memory (PIM)
Language
ISSN
0278-0070
1937-4151
Abstract
With the widespread use of deep neural networks (DNNs) in intelligent systems, DNN accelerators with high performance and energy efficiency are greatly demanded. As one of the feasible processing-in-memory (PIM) architectures, 3-D stacked-DRAM-based PIM (DRAM-PIM) architecture enables large-capacity memory and low-cost memory access, which is a promising solution for DNN accelerators with better performance and energy efficiency. However, the low-access-cost characteristics of stacked DRAM and the distributed manner of memory access and data storing require us to rebalance the hardware design and DNN mapping. In this article, we propose NicePIM to efficiently explore the design space of hardware architecture and DNN mapping of DRAM-PIM-based DNN inference accelerators, which consists of three key components: 1) PIM-Tuner; 2) PIM-Mapper; and 3) data-scheduler. PIM-Tuner optimizes the hardware configurations leveraging a DNN model for classifying area-compliant PIM-node designs and a deep kernel learning model for identifying better-hardware parameters. PIM-Mapper explores a variety of DNN mapping configurations, including parallelism between branches of DNN, DNN layer partitioning, DRAM capacity allocation, and data layout pattern in DRAM, to generate high-hardware-utilization DNN mapping schemes for various hardware configurations. The data-scheduler employs an integer-linear-programming-based data scheduling algorithm to alleviate the inter-PIM-node communication overhead of data-sharing brought by DNN layer partitioning. Experimental results demonstrate that NicePIM can optimize hardware configurations for DRAM-PIM systems effectively and can generate high-quality DNN mapping schemes with latency and energy cost reduced by 37% and 28% on average, respectively, compared to the baseline method.