학술논문

SmartDIMM: In-Memory Acceleration of Upper Layer Protocols
Document Type
Conference
Source
2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA) HPCA High-Performance Computer Architecture (HPCA), 2024 IEEE International Symposium on. :312-329 Mar, 2024
Subject
Computing and Processing
Transport protocols
Performance evaluation
Random access memory
Computer architecture
Bandwidth
Programming
Servers
Processing near-memory
Datacenter-scale systems and architectures
End-host Networking
Language
ISSN
2378-203X
Abstract
There has been significant focus on offloading upperlayer network protocols (ULPs) to accelerators located on CPUs and SmartNICs. However, restricting accelerator placement to these locations limits both the variety of ULPs that can be accelerated and the overall performance. In particular, it overlooks the opportunity to accelerate ULPs running atop a stateful transport protocol in the face of high cache contention. That is, at high network rates, the frequent DRAM accesses and SmartNIC-CPU synchronizations outweigh the benefits of hardware acceleration. This work introduces SmartDIMM, which unlocks the opportunity for accelerating ULPs running atop stateful transport protocols that primarily operate on data stored in DRAM. We prototyped SmartDIMM using Samsung's AxDIMM and implemented endto-end offloading of (de/en)cryption and (de)compression– two ULPs widely employed in datacenters. We then compared the performance of SmartDIMM with accelerator placements on the CPU, SmartNIC, and PCIe cards. Our results demonstrate that ULP offloading on SmartDIMM outperforms CPU, SmartNIC and PCIe-based offload configurations. In comparison to a server executing (de/en)cryption and (de)compression on the CPU, SmartDIMM achieves 21.0% to 10.28 × higher requests per second and 36.3% to 88.9% lower memory bandwidth utilization.