학술논문

Convolutional Acceleration Algorithm Combining Loop Optimization and Automatic Scheduling
Document Type
Conference
Source
2023 International Conference for Advancement in Technology (ICONAT) Advancement in Technology (ICONAT), 2023 International Conference for. :1-7 Jan, 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Deep learning
Processor scheduling
Convolution
Computational modeling
Layout
Neural networks
Parallel processing
Convolution computation
Compile optimization
Loop optimization
Automatic scheduling
Language
Abstract
Convolutional neural networks are widely used in deep learning, and convolutional computation accounts for the largest proportion of time in convolutional neural networks, which basically determines the performance of the whole network, so it is important to perform acceleration optimization for the convolutional layer, which can improve the operation speed of the whole neural network to a certain extent and thus accelerate the forward inference of the model. In this paper, we propose an effective convolutional acceleration method that combines traditional compilation techniques and deep learning compilation techniques, starting from the data layout and computational order of the input matrix for convolutional operations. First, we combine the data layout of the input matrix in the MEC algorithm with the loop unrolling, which enhances the parallelism of data transformation. Next, we integrate the intermediate state matrix multiplication operation with the traditional compilation optimization technique and the Ansor [22] in deep learning compiler TVM, which effectively improves the speed of the matrix multiplication operation. In addition, our experiments show that our method effectively speeds up the convolution operation and reduces the running time.