학술논문

HERMES: Hardware-Efficient Speculative Dataflow Architecture for Bonsai Merkle Tree-Based Memory Authentication
Document Type
Conference
Source
2021 IEEE International Symposium on Hardware Oriented Security and Trust (HOST) Hardware Oriented Security and Trust (HOST), 2021 IEEE International Symposium on. :203-213 Dec, 2021
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Technological innovation
Nonvolatile memory
Memory management
Throughput
Hardware
Computer crashes
Security
Language
Abstract
Emerging byte-addressable Non-Volatile Memory (NVM) technology, although promising superior memory density and ultra-low energy consumption, poses unique challenges to guaranteeing memory confidentiality, integrity, and crash-consistency. As such, extensive research has been conducted to transparently protect memory security through an FPGA-implemented middleware that effectively deploys encryption, authentication/integrity verification, and replay attack protection. Bonsai Merkle tree (BMT) has been proven to be highly effective in guaranteeing memory integrity and protecting against replay attack. However, when used in a strictly persistent trusted execution environment (TEE), BMT-based memory integrity protection severely bottlenecks memory performance because properly maintaining a BMT results in a deep traversal over the hash tree for every counter update. In this paper, we propose HERMES, a hardware-efficient memory integrity engine specifically designed to deliver a crash-consistent BMT for NVM, capable of processing multiple outstanding counter requests in flight, which significantly improves both latency and throughput of all BMT operations through leveraging an asynchronous dataflow architecture and speculative execution. HERMES incorporates three architectural innovations: (1) a speculative control logic and a speculative temporary buffer dedicatedly designed and deployed at each level; (2) an optimized hardware component verifying all BMT levels in parallel and an adaptive algorithm adapting to caching status of BMT levels; (3) a formalized message format transferred between BMT levels to accommodate both counter operations within a unified architecture where each level is able to adaptively behave. Experimented with Shuhai memory bandwidth benchmark, HERMES achieved up to 7.9x higher throughput and up to 3.5x shorter latency over the state-of-the-art ARES while consuming 2x resource utilization as a tradeoff.