학술논문

Fast and Inexpensive High-Level Synthesis Design Space Exploration: Machine Learning to the Rescue
Document Type
Periodical
Source
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on. 42(11):3939-3950 Nov, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Field programmable gate arrays
Behavioral sciences
Transfer learning
Predictive models
Licenses
Space exploration
Metaheuristics
ASIC
design space exploration
FPGA
high-level synthesis (HLS)
machine learning
Language
ISSN
0278-0070
1937-4151
Abstract
High-level synthesis (HLS) has multiple significant advantages over traditional RT-level design flows. One in particular that we address in this work is the ability to generate multiple functional equivalent design variants with unique tradeoffs, such as area, performance, and power from the same behavioral description. This is typically done by setting synthesis options in the form or pragmas (comments) to mainly control how to synthesize arrays (RAM or registers), loops (unroll, partially unroll, no unroll or pipeline), and functions (inline or not). Setting different pragma combinations lead to these different design implementations. Out of all the pragma combinations the designer is typically only interested in those that lead to the Pareto-optimal designs (PODs). Fortunately, this search can be automated, but unfortunately, the search space to find these pragma combinations grows supra-linearly with the number of pragma settings. Thus, fast and efficient heuristics are needed. These heuristics generate a new pragma combination and then evaluate their effect by synthesizing (HLS) it. The most time-consuming part of this process is having to execute a full synthesis (HLS) on the behavioral description for every new pragma combination. One obvious way to accelerate the exploration is to parallelize the exploration process using a multithreaded heuristic. The theoretical speedup should match the number of parallel threads. The main problem with this approach is that every HLS invokation requires to check out an HLS tool license. This license is not released until the synthesis process has finished. This implies that the maximum number of parallel threads is restricted by the number of available licenses, which in the ASIC case are extremely expensive. On the contrary, FPGA vendors make their HLS tools free. Thus, it is tempting to investigate if FPGA HLS tools can be used to find the PODs in the ASIC case. To address this, in this work we present a dedicated multithreaded parallel HLS design space explorer (DSE) based on transfer learning that is able to accelerate HLS DSE for ASICs by targeting first FPGAs and using machine learning to convert the exploration results obtained to find the optimal ASIC equivalent. Experimental results show the effectiveness and robustness of our approach.