학술논문

Workload Partitioning Strategy for Improved Parallelism on FPGA-CPU Heterogeneous Chips
Document Type
Conference
Source
2018 28th International Conference on Field Programmable Logic and Applications (FPL) FPL Field Programmable Logic and Applications (FPL), 2018 28th International Conference on. :376-3764 Aug, 2018
Subject
Computing and Processing
Field programmable gate arrays
Throughput
Performance evaluation
Pipeline processing
Benchmark testing
Task analysis
Hardware
scheduling
heterogeneous
parallelisation
throughput
energy
power
FPGA
ARM
SoC
ZCU102
Language
ISSN
1946-1488
Abstract
In heterogeneous computing, efficient parallelism can be obtained if every device runs the same task on a different portion of the data set. This requires designing a scheduler which assigns data chunks to compute units proportional to their throughputs. For FPGA-CPU heterogeneous devices, to provide the best possible overall throughput, a scheduler should accurately evaluate the different performance behaviour of the compute devices. In this article, we propose a scheduler which initially detects the highest throughput each device can obtain for a specific application with negligible overhead and then partitions the dataset for improved performance. To demonstrate the efficiency of this method, we choose a Zynq UltraScale+ ZCU102 device as the hardware target and parallelise four applications showing that the developed scheduler can provide up to 94.06% of the throughput achievable at an ideal condition, with comparable power and energy consumption.