학술논문

FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software

Document Type

Working Paper

Author

Kim, Jin Hee; Grady, Brett; Lian, Ruolong; Brothers, John; Anderson, Jason H.

Source

J. H. Kim, B. Grady, R. Lian, J. Brothers and J. H. Anderson, "FPGA-based CNN inference accelerator synthesized from multi-threaded C software," 2017 30th IEEE International System-on-Chip Conference (SOCC), Munich, 2017, pp. 268-273

Subject

Computer Science - Machine Learning
Computer Science - Hardware Architecture
Computer Science - Performance
Computer Science - Programming Languages
Statistics - Machine Learning

Language

Abstract

A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송