e-Article

Document Type

Conference

Author

Agrawal, Ankur; Lee, Sae Kyu; Silberman, Joel; Ziegler, Matthew; Kang, Mingu; Venkataramani, Swagath; Cao, Nianzheng; Fleischer, Bruce; Guillorn, Michael; Cohen, Matthew; Mueller, Silvia; Oh, Jinwook; Lutz, Martin; Jung, Jinwook; Koswatta, Siyu; Zhou, Ching; Zalani, Vidhi; Bonanno, James; Casatuta, Robert; Chen, Chia-Yu; Choi, Jungwook; Haynie, Howard; Herbert, Alyssa; Jain, Radhika; Kar, Monodeep; Kim, Kyu-Hyoun; Li, Yulong; Ren, Zhibin; Rider, Scot; Schaal, Marcel; Schelm, Kerstin; Scheuermann, Michael; Sun, Xiao; Tran, Hung; Wang, Naigang; Wang, Wei; Zhang, Xin; Shah, Vinay; Curran, Brian; Srinivasan, Vijayalakshmi; Lu, Pong-Fei; Shukla, Sunil; Chang, Leland; Gopalakrishnan, Kailash

Source

2021 IEEE International Solid-State Circuits Conference (ISSCC) Solid-State Circuits Conference (ISSCC), 2021 IEEE International. 64:144-146 Feb, 2021

Subject

Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
Training
Power system management
AI accelerators
Inference algorithms
Solid state circuits
Integrated circuit modeling
Optimization

Language

ISSN

2376-8606

Abstract

Low-precision computation is the key enabling factor to achieve high compute densities (TOPS/W and TOPS/mm 2 ) in AI hardware accelerators across cloud and edge platforms. However, robust deep learning (DL) model accuracy equivalent to high-precision computation must be maintained. Improvements in bandwidth, architecture, and power management are also required to harness the benefit of reduced precision by feeding and supporting more parallel engines to achieve high sustained utilization and optimize performance within a given product power envelope. In this work, we present a 4-core AI chip in 7nm EUV technology that exploits cutting-edge algorithmic advances for iso-accurate models in low-precision training and inference [1, 2] and aggressive circuit/architecture optimization to achieve leading-edge power-performance. The chip supports fp16 (DLFIoat16 [8]) and hybrid-fp8 (hfp8) [1] formats for training and inference of DL models, as well as int4 and int2 formats for highly scaled inference.

부산대학교 도서관

Online Access

Send an e-mail