학술논문

A 118 GOPS/mm23D eDRAM TensorCore Architecture for Large-scale Matrix Multiplication
Document Type
Conference
Source
2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC) HIPC High Performance Computing, Data, and Analytics (HiPC), 2023 IEEE 30th International Conference on. :61-65 Dec, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Three-dimensional displays
Computational modeling
Circuits
Mixed reality
Virtual reality
Transformers
Energy efficiency
ML accelerator
matrix multiplication
monolithic 3D
eDRAM
Compute-in-memory
CIM
Language
ISSN
2640-0316
Abstract
The computational demands for recent large transformer- based language models and Neural Radiance Fields (NeRF) have rapidly increased, impacting applications like conversational AI and Mixed Reality (MR). Current accelerator architectures struggle to cope with the vast computational requirements, creating a gap with slowly growing hardware resources. This paper proposes repurposing memory components as high-density computational units, leveraging recent advancements in Back-End-Of-Line (BEOL) transistors and monolithic 3D integration techniques. An ultra-high density monolithic 3D eDRAM is presented as a reconfigurable matrix multiplication unit, co-designed with analog computation circuits, achieving energy efficiency up to 2.41 TOPS/W, performance up to 1.71 TOPS on bfloat16, and compute intensity up to 118 GOPS/mm 2 . A comprehensive multi-cube(core) architecture is also devised and optimized with bit stationary tensorcore dataflow. We evaluate the proposed architecture on state-of-the-art machine learning models: NeRF and LLaMa-7B, improving the computation density by up to 6.59x and 1.12x compared with GPU and state-of-the-art vector processor designs, respectively.