학술논문

ASLog: An Area-Efficient CNN Accelerator for Per-Channel Logarithmic Post-Training Quantization
Document Type
Article
Source
Circuits and Systems I: Regular Papers, IEEE Transactions on; December 2023, Vol. 70 Issue: 12 p5380-5393, 14p
Subject
Language
ISSN
15498328; 15580806
Abstract
Post-training quantization (PTQ) has been proven an efficient model compression technique for Convolution Neural Networks (CNNs), without re-training or access to labeled datasets. However, it remains challenging for a CNN accelerator to fulfill the efficiency potential of PTQ methods. A large number of PTQ techniques blindly pursue high theoretic compression effect and accuracy, ignoring their impact on the actual hardware implementation, which causes more hardware overhead than benefit. This paper introduces ASLog, a PTQ-friendly CNN accelerator that explores four key designs in an algorithm-hardware co-optimizing manner: the first practical 4-bit logarithmic PTQ pipeline SLogII, the multiplier-free arithmetic element (AE) design, the energy-efficient bias correction element (BCE) design, and the per-channel quantization friendly (PCF) architecture and dataflow. The proposed SLogII PTQ pipeline can push the limit of logarithmic PTQ to 4-bit with < 2.5% accuracy degradation on various image classification and face recognition tasks. Exploiting the approximate computing design and a novel encoding and decoding scheme, the proposed SLogII AE is >40% lower in power and area consumption compared with a common 8-bit multiplier. The BCE and PCF design proposed in this paper are the first to consider the hardware impact of the widely-used per-channel quantization and bias correction technique, enabling an efficient PTQ-friendly implementation with a small hardware overhead. The ASLog is validated in a UMC 40-nm process, with 12.2 TOPS/W energy efficiency and 0.80 mm2 core area. The ASLog can achieve 336.3 GOPS/mm2 area efficiency and >500 OPs/Byte operational intensity, which map to over $1.85\times $ and $1.12\times $ improvement compared with the previous related works.