학술논문

TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference

Document Type

Conference

Author

Wei, Xuechao; Liang, Yun; Li, Xiuhong; Yu, Cody Hao; Zhang, Peng; Cong, Jason

Source

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Computer-Aided Design (ICCAD), 2018 IEEE/ACM International Conference on. :1-8 Nov, 2018

Subject

Components, Circuits, Devices and Systems

Language

ISSN

1558-2434

Abstract

FPGAs are more and more widely used as reconfigurable hardware accelerators for applications leveraging convolutional neural networks (CNNs) in recent years. Previous designs normally adopt a uniform accelerator architecture that processes all layers of a given CNN model one after another. This homogeneous design methodology usually has dynamic resource underutilization issue due to the tensor shape diversity of different layers. As a result, designs equipped with heterogeneous accelerators specific for different layers were proposed to resolve this issue. However, existing heterogeneous designs sacrifice latency for throughput by concurrent execution of multiple input images on different accelerators. In this paper, we propose an architecture named Tile-Grained Pipeline Architecture (TGPA) for low latency CNN inference. TGPA adopts a heterogeneous design which supports pipelining execution of multiple tiles within a single input image on multiple heterogeneous accelerators. The accelerators are partitioned onto different FPGA dies to guarantee high frequency. A partition strategy is designd to maximize on-chip resource utilization. Experiment results show that TGPA designs for different CNN models achieve up to 40% performance improvement than homogeneous designs, and 3X latency reduction over state-of-the-art designs.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송