학술논문

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Document Type

Working Paper

Author

Dong, Pingcheng; Tan, Yonghao; Zhang, Dong; Ni, Tianwei; Liu, Xuejiao; Liu, Yu; Luo, Peng; Liang, Luhong; Liu, Shih-Yang; Huang, Xijie; Zhu, Huaiyu; Pan, Yun; An, Fengwei; Cheng, Kwang-Ting

Source

Subject

Computer Science - Machine Learning
Computer Science - Hardware Architecture
Computer Science - Neural and Evolutionary Computing

Language

Abstract

Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https:// github.com/PingchengDong/GQA-LUT.
Comment: 61st ACM/IEEE Design Automation Conference (DAC) 2024

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송