학술논문

An OpenCL-Based FPGA Accelerator for Compressed YOLOv2
Document Type
Conference
Source
2019 International Conference on Field-Programmable Technology (ICFPT) ICFPT Field-Programmable Technology (ICFPT), 2019 International Conference on. :235-238 Dec, 2019
Subject
Components, Circuits, Devices and Systems
Computing and Processing
CNN, FPGA, Winograd, OpenCL, model compression
Language
Abstract
Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.