학술논문

Automatic support for multi-module parallelism from computational patterns
Document Type
Conference
Source
2015 25th International Conference on Field Programmable Logic and Applications (FPL) Field Programmable Logic and Applications (FPL), 2015 25th International Conference on. :1-8 Sep, 2015
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Kernel
Hardware
Synchronization
Parallel processing
Field programmable gate arrays
Optimization
Bandwidth
Language
ISSN
1946-147X
1946-1488
Abstract
Field Programmable Gate Arrays (FPGAs) can be customized into application-specific architectures to achieve high performance and energy-efficiency. Unfortunately, they are yet to gain significant adoption by application developers due to their low-level programming model. Moreover, to obtain good performance in an FPGA design, one often needs to correctly parallelize computation and balance the computational throughput with the available data access bandwidth. To address the programming model problem, recent efforts have focused on composing applications out of parallel computational patterns, such as map, reduce, zipWith and foreach, and leveraging the properties of these patterns to generate highly parallel hardware modules capable of high performance. In this work, we focus on the problem of further improving the performance and show that we can utilize the knowledge of how data is consumed and produced by these computational patterns in conjunction with the information of the system architecture to automatically parallelize computations across multiple hardware modules. To achieve this, we automatically infer synchronization needs arising due to parallelization and generate a complete system that can obtain high performance for a given application. We evaluate our approach using seven applications from different domains and show that our automatically generated designs achieve performance improvements ranging from 1.8 to 9.4 times.