학술논문

A Multi-Memory Field-Programmable Custom Computing Machine for Accelerating Compute-Intensive Applications
Document Type
Conference
Source
2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2021 IEEE 12th Annual. :0619-0628 Dec, 2021
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Fields, Waves and Electromagnetics
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Multicore processing
Memory architecture
Logic gates
Mobile communication
Software
Resource management
Hardware acceleration
Reconfigurable Computing
Field-Programmable Gate Arrays
Field-Programmable Custom Computing Machines
High Performance Computing
Multi-Memory Architecture
Multiple Memory Banks
Language
Abstract
In this paper, we present an FPGA-based multi-memory controller for accelerating computationally intensive applications. Our architecture accepts multiple inputs and produces multiple outputs for each clock cycle. The architecture includes processor cores with pipelined functional units tailored for each application. Additionally, we present an approach to achieve one to two orders-of-magnitude speedup over a traditional software implementation executing on a conventional multi-core processor. Even though the clock frequency of the Field-Programmable Custom Computing Machine (FCCM) is an order-of-magnitude slower than a conventional multi-core processor, the FCCM is significantly faster. We used the Power function as an application to demonstrate the merits of our FCCM. In our experiments, we executed the Power function in software and compared the software execution times with the execution time of an FCCM. Additionally, we also compared FCCM execution time with the OpenMP implementation of the function. Our experiments show that the results obtained using our multi-memory architecture are 57X faster than software implementation and 17X faster than OpenMP implementation executing the Power function, respectively.