학술논문

DML: Dynamic Partial Reconfiguration With Scalable Task Scheduling for Multi-Applications on FPGAs
Document Type
Periodical
Source
IEEE Transactions on Computers IEEE Trans. Comput. Computers, IEEE Transactions on. 71(10):2577-2591 Oct, 2022
Subject
Computing and Processing
Field programmable gate arrays
Task analysis
Pipeline processing
Schedules
Dynamic scheduling
Runtime
Processor scheduling
Partial reconfiguration
integer linear programming
scheduling
FPGA
dynamic reconfiguration
Language
ISSN
0018-9340
1557-9956
2326-3814
Abstract
For several new applications, FPGA-based computation has shown better latency and energy efficiency compared to CPU or GPU-based solutions. We note two clear trends in FPGA-based computing. On the edge, the complexity of applications is increasing, requiring more resources than possible on today's edge FPGAs. In contrast, in the data center, FPGA sizes have increased to the point where multiple applications must be mapped to fully utilize the programmable fabric. While these limitations affect two separate domains, they both can be dealt with by using dynamic partial reconfiguration (DPR). Thus, there is a renewed interest to deploy DPR for FPGA-based hardware. In this work, we present Doing More with Less (DML) – a methodology for scheduling heterogeneous tasks across an FPGA's resources in a resource efficient manner while effectively hiding the latency of DPR. With the help of an integer linear programming (ILP) based scheduler, we demonstrate the mapping of diverse computational workloads in both cloud and edge-like scenarios. Our novel contributions include: enabling IP-level pipelining and parallelization to exploit the parallelism available within batches of work in our scheduler, and strategies to map and run multiple applications simultaneously. We consider the application of our methodology on real world benchmarks on both small (a Zedboard) and large (a ZCU106) FPGAs, across different workload batching and multiple-application scenarios. Our evaluation proves the real world efficacy of our solution, and we demonstrate an average speedup of 5X and up to 7.65X on a ZCU106 over a bulk-batching baseline via our scheduling strategies. We also demonstrate the scalablity of our scheduler by simultaneously mapping multiple applications to a single FPGA, and explore different approaches to sharing FPGA resources between applications.