학술논문
A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling
Document Type
Periodical
Author
Lee, S.K.; Agrawal, A.; Silberman, J.; Ziegler, M.; Kang, M.; Venkataramani, S.; Cao, N.; Fleischer, B.; Guillorn, M.; Cohen, M.; Mueller, S.M.; Oh, J.; Lutz, M.; Jung, J.; Koswatta, S.; Zhou, C.; Zalani, V.; Kar, M.; Bonanno, J.; Casatuta, R.; Chen, C.; Choi, J.; Haynie, H.; Herbert, A.; Jain, R.; Kim, K.; Li, Y.; Ren, Z.; Rider, S.; Schaal, M.; Schelm, K.; Scheuermann, M.R.; Sun, X.; Tran, H.; Wang, N.; Wang, W.; Zhang, X.; Shah, V.; Curran, B.; Srinivasan, V.; Lu, P.; Shukla, S.; Gopalakrishnan, K.; Chang, L.
Source
IEEE Journal of Solid-State Circuits IEEE J. Solid-State Circuits Solid-State Circuits, IEEE Journal of. 57(1):182-197 Jan, 2022
Subject
Language
ISSN
0018-9200
1558-173X
1558-173X
Abstract
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and INT2—to support diverse application demands for training and inference. The chip leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency for 8-bit floating-point (FP8) training and INT4 inference without model accuracy degradation. A new HFP8 format combined with separation of the floating- and fixed-point pipelines and aggressive circuit/architecture optimization enables performance improvements while maintaining high compute utilization. A high-bandwidth ring protocol enables efficient data communication, while power management using workload-aware clock throttling maximizes performance within a given power budget. The AI chip demonstrates 3.58-TFLOPS/W peak energy efficiency and 26.2-TFLOPS peak performance for HFP8 iso-accuracy training, and 16.9-TOPS/W peak energy efficiency and 104.9-TOPS peak performance for INT4 iso-accuracy inference.