학술논문

A 118 GOPS/mm23D eDRAM TensorCore Architecture for Large-scale Matrix Multiplication

Document Type

Conference

Author

Yang, Mengtian; Wang, Yipeng; Kulkarni, Jaydeep P.

Source

2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC) HIPC High Performance Computing, Data, and Analytics (HiPC), 2023 IEEE 30th International Conference on. :61-65 Dec, 2023

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Three-dimensional displays
Computational modeling
Circuits
Mixed reality
Virtual reality
Transformers
Energy efficiency
ML accelerator
matrix multiplication
monolithic 3D
eDRAM
Compute-in-memory
CIM

Language

ISSN

2640-0316

Abstract

The computational demands for recent large transformer- based language models and Neural Radiance Fields (NeRF) have rapidly increased, impacting applications like conversational AI and Mixed Reality (MR). Current accelerator architectures struggle to cope with the vast computational requirements, creating a gap with slowly growing hardware resources. This paper proposes repurposing memory components as high-density computational units, leveraging recent advancements in Back-End-Of-Line (BEOL) transistors and monolithic 3D integration techniques. An ultra-high density monolithic 3D eDRAM is presented as a reconfigurable matrix multiplication unit, co-designed with analog computation circuits, achieving energy efficiency up to 2.41 TOPS/W, performance up to 1.71 TOPS on bfloat16, and compute intensity up to 118 GOPS/mm 2 . A comprehensive multi-cube(core) architecture is also devised and optimized with bit stationary tensorcore dataflow. We evaluate the proposed architecture on state-of-the-art machine learning models: NeRF and LLaMa-7B, improving the computation density by up to 6.59x and 1.12x compared with GPU and state-of-the-art vector processor designs, respectively.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송