학술논문

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Document Type

Conference

Author

Islamoglu, Gamze; Scherer, Moritz; Paulin, Gianna; Fischer, Tim; Jung, Victor J.B.; Garofalo, Angelo; Benini, Luca

Source

2023 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) Low Power Electronics and Design (ISLPED), 2023 IEEE/ACM International Symposium on. :1-6 Aug, 2023

Subject

Components, Circuits, Devices and Systems
Computing and Processing
Quantization (signal)
Embedded systems
Power demand
Computational modeling
Silicon-on-insulator
Parallel processing
Transformers
neural network accelerators
transformers
attention
softmax

Language

Abstract

Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm 2 in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송