학술논문

Off-Chip Memory Allocation for Neural Processing Units

Document Type

Periodical

Author

Kvochko, A.; Maltsev, E.; Balyshev, A.; Malakhov, S.; Efimov, A.

Source

IEEE Access Access, IEEE. 12:9931-9939 2024

Subject

Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Tensors
Memory management
Resource management
Neural networks
Mathematical models
Deep learning
System-on-chip
NPU
memory allocation
neural network runtime
tiling
strip-packing problem

Language

ISSN

2169-3536

Abstract

Many modern Systems-on-Chip (SoCs) are equipped with specialized Machine Learning (ML) accelerators that use both on-chip and off-chip memory to execute neural networks. While on-chip memory usually has a hard limit, off-chip memory is often considered large enough to hold the network’s inputs, outputs, weights, and any intermediate results that may occur during model execution. This assumption may not hold for edge devices, such as smartphones, which usually have a limit on the amount of memory a process can use. In this study, we propose a novel approach for minimizing a neural network’s off-chip memory usage by introducing a tile-aware allocator capable of reusing memory occupied by parts of a tensor before the entire tensor expires. We describe the necessary conditions for such an off-chip memory allocation approach and provide the results, showing that it can save up to 33% of the peak off-chip memory usage in some common network architectures.

Online Access

Open Access (EBSCO) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송