학술논문

Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing
Document Type
Periodical
Source
IEEE Micro Micro, IEEE. 41(2):36-42 Apr, 2021
Subject
Computing and Processing
Bandwidth
Computer architecture
Floating-point arithmetic
Hardware
Uplink
Energy efficiency
Semantics
Encoding
Language
ISSN
0272-1732
1937-4143
Abstract
Data-parallel problems demand ever growing floating-point (FP) operations per second under tight area- and energy-efficiency constraints. In this work, we present Manticore, a general-purpose, ultraefficient chiplet-based architecture for data-parallel FP workloads. We have manufactured a prototype of the chiplet’s computational core in Globalfoundries 22FDX process and demonstrate more than 5x improvement in energy efficiency on FP intensive workloads compared to CPUs and GPUs. The compute capability at high energy and area efficiency is provided in “Snitch: A tiny pseudo dual-issue processor for area and energy efficient execution of floating-point intensive workloads,” IEEE Trans. Comput., containing eight small integer cores, each controlling a large floating-point unit (FPU). The core supports two custom ISA extensions: The SSRs extension elides explicit load and store instructions by encoding them as register reads and writes (“Stream semantic registers: A lightweight RISC-V ISA extension achieving full compute utilization in single-issue cores,” IEEE Trans. Comput.). The floating-point repetition extension decouples the integer core from the FPU allowing floating-point instructions to be issued independently. These two extensions allow the single-issue core to minimize its instruction fetch bandwidth and saturate the instruction bandwidth of the FPU, achieving FPU utilization above 90%, with more than 40% of core area dedicated to the FPU.