학술논문

APTPU: Approximate Computing Based Tensor Processing Unit
Document Type
Periodical
Source
IEEE Transactions on Circuits and Systems I: Regular Papers IEEE Trans. Circuits Syst. I Circuits and Systems I: Regular Papers, IEEE Transactions on. 69(12):5135-5146 Dec, 2022
Subject
Components, Circuits, Devices and Systems
Computer architecture
Systolic arrays
Adders
Tensors
Neural networks
Deep learning
Machine learning
Hardware accelerators
Approximate computing
tensor processing units
machine learning hardware accelerator
systolic array
Language
ISSN
1549-8328
1558-0806
Abstract
We propose an approximate tensor processing unit (APTPU), which includes two main components: (1) approximate processing elements (APEs) consisting of a low-precision multiplier and an approximate adder, and (2) pre-approximate units (PAUs) which are shared among the APEs in the APTPU’s systolic array, functioning as the steering logic to pre-process the operands and feed them to the APEs. We conduct extensive experiments to evaluate the performance of the APTPU across various configurations and various workloads. The results show that the APTPU’s systolic array achieves up to $5.2\times \textit {TOPS}/mm^{2}$ and $4.4\times \textit {TOPS}/W$ improvements compared to that of a conventional systolic array design. The comparison between the proposed APTPU and in-house TPU designs shows that we can achieve approximately $2.5\times $ and $1.2\times $ area and power reduction, respectively, while realizing comparable accuracy. Finally, a comparison with the state-of-the-art approximate systolic arrays shows that the APTPU can realize up to $1.58\times $ , $2\times $ , and $1.78\times $ , reduction in delay, power, and area, respectively, while using similar design specifications and synthesis constraints.