학술논문

DML: Dynamic Partial Reconfiguration With Scalable Task Scheduling for Multi-Applications on FPGAs

Document Type

Periodical

Author

Dhar, A.; Richter, E.; Yu, M.; Zuo, W.; Wang, X.; Kim, N.S.; Chen, D.

Source

IEEE Transactions on Computers IEEE Trans. Comput. Computers, IEEE Transactions on. 71(10):2577-2591 Oct, 2022

Subject

Computing and Processing
Field programmable gate arrays
Task analysis
Pipeline processing
Schedules
Dynamic scheduling
Runtime
Processor scheduling
Partial reconfiguration
integer linear programming
scheduling
FPGA
dynamic reconfiguration

Language

ISSN

0018-9340
1557-9956
2326-3814

Abstract

For several new applications, FPGA-based computation has shown better latency and energy efficiency compared to CPU or GPU-based solutions. We note two clear trends in FPGA-based computing. On the edge, the complexity of applications is increasing, requiring more resources than possible on today's edge FPGAs. In contrast, in the data center, FPGA sizes have increased to the point where multiple applications must be mapped to fully utilize the programmable fabric. While these limitations affect two separate domains, they both can be dealt with by using dynamic partial reconfiguration (DPR). Thus, there is a renewed interest to deploy DPR for FPGA-based hardware. In this work, we present Doing More with Less (DML) – a methodology for scheduling heterogeneous tasks across an FPGA's resources in a resource efficient manner while effectively hiding the latency of DPR. With the help of an integer linear programming (ILP) based scheduler, we demonstrate the mapping of diverse computational workloads in both cloud and edge-like scenarios. Our novel contributions include: enabling IP-level pipelining and parallelization to exploit the parallelism available within batches of work in our scheduler, and strategies to map and run multiple applications simultaneously. We consider the application of our methodology on real world benchmarks on both small (a Zedboard) and large (a ZCU106) FPGAs, across different workload batching and multiple-application scenarios. Our evaluation proves the real world efficacy of our solution, and we demonstrate an average speedup of 5X and up to 7.65X on a ZCU106 over a bulk-batching baseline via our scheduling strategies. We also demonstrate the scalablity of our scheduler by simultaneously mapping multiple applications to a single FPGA, and explore different approaches to sharing FPGA resources between applications.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송