학술논문

H3DAtten: Heterogeneous 3-D Integrated Hybrid Analog and Digital Compute-in-Memory Accelerator for Vision Transformer Self-Attention
Document Type
Periodical
Source
IEEE Transactions on Very Large Scale Integration (VLSI) Systems IEEE Trans. VLSI Syst. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on. 31(10):1592-1602 Oct, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Transformers
Computational modeling
Windows
Task analysis
Common Information Model (computing)
Hardware
System-on-chip
Artificial intelligence (AI) accelerator
compute-in-memory (CIM)
deep learning (DL)
heterogeneous 3-D integration (H3D)
resistive random access memory (RRAM)
vision transformer
Language
ISSN
1063-8210
1557-9999
Abstract
After the success of the transformer networks on natural language processing (NLP), the application of transformers to computer vision (CV) has followed suit to deliver unprecedented performance gains on vision tasks, including image recognition and object detection. The multihead self-attention (MHSA) is the key component in transformers, allowing the models to learn the amount of attention paid to each input position. Despite its strong modeling capability, MHSA involves complex operations that make transformers prohibitively costly for hardware deployment. Existing acceleration efforts with conventional hardware platforms are challenged by the memory wall. To alleviate the memory wall problem, compute-in-memory (CIM) is a promising solution by storing all model parameters on-chip in compute-capable memory arrays. The footprint of 2-D CIM designs must, however, expand to accommodate the increasingly larger model sizes. In this work, we present a heterogeneous 3-D integrated (H3D) accelerator to target the MHSA workloads in vision transformers. H3D allows the proposed H3DAtten architecture to combine the merits of resistive random access memory (RRAM)-based analog CIM (ACIM) in 40 nm and static random access memory (SRAM)-based digital CIM (DCIM) in 16 nm. We perform comprehensive signaling and thermal analyses to examine the effects of 3-D stacking on the accelerator. Compared to iso-capacity 2-D baseline designs, the proposed 5-tier H3DAtten accelerator achieves $8.4\times $ compute density without experiencing accuracy loss on the ImageNet-1k dataset.