학술논문

A Low-Power Transprecision Floating-Point Cluster for Efficient Near-Sensor Data Analytics.

Document Type

Article

Author

Montagna, Fabio; Mach, Stefan; Benatti, Simone; Garofalo, Angelo; Ottavi, Gianmarco; Benini, Luca; Rossi, Davide; Tagliavini, Giuseppe

Source

IEEE Transactions on Parallel & Distributed Systems. May2022, Vol. 33 Issue 5, p1038-1053. 16p.

Subject

*Floating-point arithmetic
*Computer workstation clusters
*Space exploration
*Energy consumption
*Open source software
*Parallel programming

Language

ISSN

1045-9219

Abstract

Recent applications in low-power (1-20 mW) near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this article, we propose a low-power multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our solution – based on the open-source RISC-V architecture – combines parallelization and sub-word vectorization with a dedicated interconnect design capable of sharing floating-point units (FPUs) among the cores. On top of this architecture, we provide a full-fledged software stack support, including a parallel low-level runtime, a compilation toolchain, and a high-level programming model, with the aim to support the development of end-to-end applications. We performed an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, varying the number of cores and FPUs to maximize performance. Orthogonally, we performed a vertical exploration to identify the most efficient solutions in terms of non-functional requirements (operating frequency, power, and area). We conducted an experimental assessment on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place-&-route analysis of the power consumption. A comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors. Finally, a real-life use case demonstrates the effectiveness of our approach in fulfilling accuracy constraints. [ABSTRACT FROM AUTHOR]

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송