학술논문

34.8 A 22nm 16Mb Floating-Point ReRAM Compute-in-Memory Macro with 31.2TFLOPS/W for AI Edge Devices

Document Type

Conference

Author

Wen, Tai-Hao; Hsu, Hung-Hsi; Khwa, Win-San; Huang, Wei-Hsing; Ke, Zhao-En; Chin, Yu-Hsiang; Wen, Hua-Jin; Chang, Yu-Chen; Hsu, Wei-Ting; Lo, Chung-Chuan; Liu, Ren-Shuo; Hsieh, Chih-Cheng; Tang, Kea-Tiong; Teng, Shih-Hsin; Chou, Chung-Cheng; Chih, Yu-Der; Chang, Tsung-Yung Jonathan; Chang, Meng-Fan

Source

2024 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2024 IEEE International. 67:580-582 Feb, 2024

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Engineered Materials, Dielectrics and Plasmas
Photonics and Electrooptics
Robotics and Control Systems
Phase change materials
Program processors
Nonvolatile memory
Interference
In-memory computing
Batteries
System-on-chip

Language

ISSN

2376-8606

Abstract

AI-edge devices demand high-precision computation (e.g. FP16 and BF16) for accurate inference in practical applications, while maintaining high energy efficiency (EF) and low standby power to prolong battery life. Thus, advanced non-volatile AI-edge processors [1, 2] require non-volatile compute-in-memory (nvCIM) [3–5] with a large non-volatile on-chip memory, to store all of the neural network’s parameters (weight data) during power-off, and high-precision high-EF multiply-and-accumulate (MAC) operations during compute, to maximize battery life. Among nvCIMs, ReRAM-nvCIM stands out as a promising candidate due to its lowest cost-per-bit (vs. MRAM, PCM, and eFlash), large on-off ratio, and resilience to magnetic-field interference. However, existing nvCIM macros [3–5] do not support floating-point (FP) computation. Implementing a FP-MAC for nvCIM faces challenges, as shown in Fig. 34.8.1, in (1) balancing the bit width tradeoff for weight pre-alignment between accuracy and storage, (2) addressing long latency and energy consumption in MAC operations due to the high input bit width in FP format, and (3) managing high array current consumption when accessing numerous memory cells (MCs) for FP operations, particularly in the low-resistance-state (LRS) ReRAM cells.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송