학술논문
Fully On-Chip MAC at 14 nm Enabled by Accurate Row-Wise Programming of PCM-Based Weights and Parallel Vector-Transport in Duration-Format
Document Type
Periodical
Author
Narayanan, P.; Ambrogio, S.; Okazaki, A.; Hosokawa, K.; Tsai, H.; Nomura, A.; Yasuda, T.; Mackin, C.; Lewis, S.C.; Friz, A.; Ishii, M.; Kohda, Y.; Mori, H.; Spoon, K.; Khaddam-Aljameh, R.; Saulnier, N.; Bergendahl, M.; Demarest, J.; Brew, K.W.; Chan, V.; Choi, S.; Ok, I.; Ahsan, I.; Lie, F.L.; Haensch, W.; Narayanan, V.; Burr, G.W.
Source
IEEE Transactions on Electron Devices IEEE Trans. Electron Devices Electron Devices, IEEE Transactions on. 68(12):6629-6636 Dec, 2021
Subject
Language
ISSN
0018-9383
1557-9646
1557-9646
Abstract
Hardware acceleration of deep learning using analog non-volatile memory (NVM) requires large arrays with high device yield, high accuracy Multiply-ACcumulate (MAC) operations, and routing frameworks for implementing arbitrary deep neural network (DNN) topologies. In this article, we present a 14-nm test-chip for Analog AI inference—it contains multiple arrays of phase change memory (PCM)-devices, each array capable of storing 512 $\times $ 512 unique DNN weights and executing massively parallel MAC operations at the location of the data. DNN excitations are transported across the chip using a duration representation on a parallel and reconfigurable 2-D mesh. To accurately transfer inference models to the chip, we describe a closed-loop tuning (CLT) algorithm that programs the four PCM conductances in each weight, achieving