학술논문

Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management
Document Type
Conference
Source
2019 56th ACM/IEEE Design Automation Conference (DAC) Design Automation Conference (DAC), 2019 56th ACM/IEEE. :1-6 Jun, 2019
Subject
Components, Circuits, Devices and Systems
Signal Processing and Analysis
Language
Abstract
Deep Neural Networks (DNNs) are becoming more and more complex than before. Previous hardware accelerator designs neglect the layer diversity in terms of computation and communication behavior. On-chip memory resources are underutilized for the memory bounded layers, leading to suboptimal performance. In addition, the increasing complexity of DNN structures makes it difficult to do on-chip memory allocation. To address these issues, we propose a layer conscious memory management framework for FPGA-based DNN hardware accelerators. Our framework exploits the layer diversity and the disjoint lifespan information of memory buffers to efficiently utilize the on-chip memory to improve the performance of the layers bounded by memory and thus the entire performance of DNNs. It consists of four key techniques working coordinately with each other. We first devise a memory allocation algorithm to allocate on-chip buffers for the memory bound layers. In addition, buffer sharing between different layers is applied to improve on-chip memory utilization. Finally, buffer prefetching and splitting are used to further reduce latency. Experiments show that our techniques can achieve 1.36X performance improvement compared with previous designs.