학술논문

CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space Exploration
Document Type
Periodical
Source
IEEE Transactions on Very Large Scale Integration (VLSI) Systems IEEE Trans. VLSI Syst. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on. 31(11):1713-1726 Nov, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Virtualization
Virtual machine monitors
Hardware
Microarchitecture
Memory management
Registers
Space exploration
CVA6
design space exploration (DSE)
hypervisor
memory management unit (MMU)
microarchitecture
RISC-V
translation lookaside buffer (TLB)
virtualization
Language
ISSN
1063-8210
1557-9999
Abstract
Virtualization is a key technology used in a wide range of applications, from cloud computing to embedded systems. Over the last few years, mainstream computer architectures were extended with hardware virtualization support, giving rise to a set of virtualization technologies (e.g., Intel VT and Arm VE) that are now proliferating in modern processors and systems on chip (SoCs). In this article, we describe our work on hardware virtualization support in the RISC-V CVA6 core. Our contribution is multifold and encompasses architecture, microarchitecture, and design space exploration (DSE). In particular, we highlight the design of a set of microarchitectural enhancements [i.e., G-stage translation lookaside buffer (GTLB) and second-level TLB (L2 TLB)] to alleviate the virtualization performance overhead. We also perform a DSE and accompanying postlayout simulations (based on 22-nm FDX technology) to assess performance, power, and area (PPA). Furthermore, we map design variants on a field-programmable gate array (FPGA) platform (Genesys 2) to assess the functional performance–area tradeoff. Based on the DSE, we select an optimal design point for the CVA6 with hardware virtualization support. For this optimal hardware configuration, we collected functional performance results by running the MiBench benchmark on Linux atop Bao hypervisor for a single-core configuration. We observed a performance speedup of up to 16% (approximately 12.5% on average) compared with virtualization-aware nonoptimized design at the minimal cost of 0.78% in area and 0.33% in power. Finally, all works described in this article are publicly available and open-sourced for the community to further evaluate additional design configurations and software stacks.