학술논문

Weave: Abstraction and Integration Flow for Accelerators of Generated Modules
Document Type
Periodical
Author
Source
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on. 43(3):854-867 Mar, 2024
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Generators
Memory management
Productivity
Space exploration
Measurement
Integrated circuit modeling
Design methodology
Accelerator
generator
integration
modular abstraction
Language
ISSN
0278-0070
1937-4151
Abstract
In modern times, domain-specific accelerators require numerous functional components to execute complex applications in a particular domain. To ensure efficient development, the conventional approach involves decomposing, implementing, and integrating modules. Over the past decade, the generator-based method has proven to enhance the productivity of module implementation. However, current abstractions pose challenges for integrating modules implemented by generators, due to implicit interface definitions, nonunified performance modeling, and fragmented memory management. These limitations result in a lower productivity of the integration process and decreased performance of the integrated accelerators. To overcome these drawbacks, we propose Weave, an abstraction for integrating generated modules and an agile design flow for domain-specific accelerators. The Weave abstraction guides module implementation and integration with a unified performance model and memory management. Furthermore, the Weave integration flow, consisting of generation, selection, and integration phrases, enables optimization of the performance of the integrated accelerator with a design space exploration algorithm and hierarchical memory management. In the experiments, the accelerator developed by Weave achieves $1.93\times $ higher performance in the deep learning domain compared to an open-source accelerator, and the integrated accelerator maintains performance for various applications with different memory access patterns.