학술논문

Application Specific Approximate Behavioral Processor
Document Type
Periodical
Source
IEEE Transactions on Sustainable Computing IEEE Trans. Sustain. Comput. Sustainable Computing, IEEE Transactions on. 8(2):165-179 Jun, 2023
Subject
Computing and Processing
Logic gates
Behavioral sciences
Optimization
Process control
Registers
Hardware
Very large scale integration
Approximate computing
bespoke behavioral processors
high-level synthesis
Language
ISSN
2377-3782
2377-3790
Abstract
Many applications require simple controllers that continuously run the same application. These applications are often found in battery operated embedded systems that require to be ultra-low power (ULP) and are very price sensitive. Some examples include IoT devices of different nature and medical devices. Currently, these systems rely on off-the-shelf general-purpose microprocessors. One of the problems of using these processors, is that not all of the resources are needed for a specific application. Furthermore, because of the regularity of the workloads running on these systems there is a large opportunity to optimize the processor by pruning those unused resources to achieve lower area (cost) and power. Moreover, these processors can be specified at the behavioral level and use High-Level Synthesis (HLS) to generate an efficient Register Transfer Level (RTL) description. This opens a window to additional optimizations as the processor implementation is fully re-optimized during the HLS process. Also, many applications running on these embedded systems tolerate imprecise outputs. These include image processing and digital signal processing (DSP) applications. This opens the door to further optimizations in the context of approximate computing. To address these issues, this work presents a methodology to customize a behavioral RISC processor automatically for a given workload such that its area and power are significantly reduced as compared to the original, general-purpose processor. First, generating a bespoke processor that leads to the exact output as compared to the original general-purpose one and then by approximating it allowing a certain level of error at the output. Compared to previous work that customizes a given processor at the gate netlist only, our proposed method shows significant benefits. In particular, this work shows that raising the level of abstraction reduces the area and power by 78.3% and 70.1% for the exact solution on average, and further reduces the area by an additional 10.0% and 16.5% for the approximate version tolerating a maximum of 10% and 20% output errors respectively.