학술논문

On-the-Fly Lowering Engine: Offloading Data Layout Conversion for Convolutional Neural Networks

Document Type

Periodical

Author

Kang, M.; Hyun, S.; Han, T.H.; Kim, J.; Hong, S.

Source

IEEE Access Access, IEEE. 10:79730-79746 2022

Subject

Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Convolution
Matrix converters
Convolutional neural networks
Memory management
Hardware
Engines
Computational modeling
Convolutional neural network
GEMM
CPU

Language

ISSN

2169-3536

Abstract

Many deep learning frameworks utilize GEneral Matrix Multiplication (GEMM)-based convolution to accelerate CNN execution. GEMM-based convolution provides faster convolution yet requires a data conversion process called lowering (i.e., im2col), which incurs significant memory overhead and diminishes performance. This paper proposes a novel hardware mechanism, called On-the-fly Lowering Engine (OLE) , to eliminate the lowering overheads. Our goal is to offload the lowering overheads from the GEMM-based convolution. With OLE, the lowered matrix is neither pre-calculated nor stored in the main memory. Instead, a hardware engine generates lowered matrix on-the-fly from the original input matrix to reduce memory footprint and bandwidth requirements. Furthermore, the hardware offloading eliminates CPU cycles for lowering operation and overlaps computation with lowering to hide the performance overhead. Our evaluation shows that OLE can reduce memory footprint of convolutional layer inputs down to $\frac {1}{12.5}\times $ and the overall memory footprint by up to 33.5%. Moreover, OLE can reduce the execution time of convolutional layers by 57.7% on average, resulting in an average speedup of $2.3\times $ for representative CNN models.

Online Access

Open Access (EBSCO) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송