학술논문

Searching Optimal Floating-Point Format for Sub-8-Bit Large Language Model Inference

Document Type

Conference

Author

Hwang, Youngdeok; Lee, Janghwan; Park, Jiwoong; Lim, Jieun; Choi, Jungwook

Source

2024 International Conference on Electronics, Information, and Communication (ICEIC) Electronics, Information, and Communication (ICEIC), 2024 International Conference on. :1-4 Jan, 2024

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Degradation
Quantization (signal)
Natural language processing
Cognition
Task analysis
Large language model
floating-point
post-training quantization
mixed-format

Language

ISSN

2767-7699

Abstract

Large Language Models (LLMs) have shown remarkable success in various natural language processing tasks. However, their extensive parameter count leads to significant memory and computational demands. To tackle these challenges, there is growing interest in employing post-training quantization (PTQ) with reduced-precision floating-point (FP) operations. Yet, the optimal FP configuration remains a topic of debate. Existing studies often overlook a thorough analysis of the diverse data distributions found in LLMs and the crucial design choice, denormal. In this paper, we conduct a comprehensive examination of the various data distributions within LLMs and the significance of denormal representation, presenting a mixed-format floating-point framework. Our proposed framework allows for sub-8-bit inference with minimal performance degradation in language modeling and reasoning tasks across a broad spectrum of LLMs.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송