학술논문

Lightweight Error Correction for In-Storage Acceleration of Large Language Model Inference

Document Type

Conference

Author

Jeong, Jinwoo; Ahn, Byungmin; Shin, Dongmin; Choi, Jungwook

Source

2024 International Conference on Electronics, Information, and Communication (ICEIC) Electronics, Information, and Communication (ICEIC), 2024 International Conference on. :1-4 Jan, 2024

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Photonics and Electrooptics
Costs
Bandwidth
Robustness
Hardware
Error correction codes
Error correction
Flash memories
large language model
error correction code
NAND flash errors

Language

ISSN

2767-7699

Abstract

As large language models (LLMs) expand their sizes, conventional GPU-based LLM inference systems face memory bandwidth and capacity limitations. An LLM inference accelerator using NAND flash storage has been proposed to overcome these challenges. However, this necessitates a significant expansion of flash channels to ensure adequate bandwidth for inference, subsequently escalating error correction code (ECC) costs. This paper examines the impact of flash memory errors on LLM inference accuracy and explores the possibility of lightweight ECC by leveraging LLM's inherent error resilience. We analyze the impact of 1) high-order bit indices masking for FP32 LLM parameters, 2) clipping, and 3) a dependency by parameter type of error robustness, and show that a combination of them can reduce ECC bandwidth by up to 9.38%.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송