학술논문
A 118 GOPS/mm23D eDRAM TensorCore Architecture for Large-scale Matrix Multiplication
Document Type
Conference
Source
2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC) HIPC High Performance Computing, Data, and Analytics (HiPC), 2023 IEEE 30th International Conference on. :61-65 Dec, 2023
Subject
Language
ISSN
2640-0316
Abstract
The computational demands for recent large transformer- based language models and Neural Radiance Fields (NeRF) have rapidly increased, impacting applications like conversational AI and Mixed Reality (MR). Current accelerator architectures struggle to cope with the vast computational requirements, creating a gap with slowly growing hardware resources. This paper proposes repurposing memory components as high-density computational units, leveraging recent advancements in Back-End-Of-Line (BEOL) transistors and monolithic 3D integration techniques. An ultra-high density monolithic 3D eDRAM is presented as a reconfigurable matrix multiplication unit, co-designed with analog computation circuits, achieving energy efficiency up to 2.41 TOPS/W, performance up to 1.71 TOPS on bfloat16, and compute intensity up to 118 GOPS/mm 2 . A comprehensive multi-cube(core) architecture is also devised and optimized with bit stationary tensorcore dataflow. We evaluate the proposed architecture on state-of-the-art machine learning models: NeRF and LLaMa-7B, improving the computation density by up to 6.59x and 1.12x compared with GPU and state-of-the-art vector processor designs, respectively.