학술논문

DDC-PIM: Efficient Algorithm/Architecture Co-Design for Doubling Data Capacity of SRAM-Based Processing-in-Memory
Document Type
Periodical
Source
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on. 43(3):906-918 Mar, 2024
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Computer architecture
FCC
Convolution
Computational modeling
Microprocessors
SRAM cells
Parallel processing
Algorithm/architecture co-design
doubling data capacity
processing-in-memory (PIM)
SRAM-PIM
Language
ISSN
0278-0070
1937-4151
Abstract
Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other nonvolatile memory-based ones, due to its inherent 6T structure for storing a single bit. Within comparable area constraints, SRAM-based PIM exhibits notably lower capacity. Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. At the algorithmic level, we propose a filter-wise complementary correlation (FCC) algorithm to obtain a bitwise complementary pair. At the architecture level, we exploit the intrinsic cross-coupled structure of 6T SRAM to store the bitwise complementary pair in their complementary states $(Q/\overline {Q})$ , thereby maximizing the data capacity of each SRAM cell. The dual-broadcast input structure and reconfigurable unit support both depthwise and pointwise convolution, adhering to the requirements of various neural networks. Evaluation results show that DDC-PIM yields about $2.84\times $ speedup on MobileNetV2 and $2.69\times $ on EfficientNet-B0 with negligible accuracy loss compared with PIM baseline implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM achieves up to $8.41\times $ and $2.75\times $ improvement in weight density and area efficiency, respectively.