학술논문

CAGC: A Content-aware Garbage Collection Scheme for Ultra-Low Latency Flash-based SSDs
Document Type
Conference
Source
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) IPDPS Parallel and Distributed Processing Symposium (IPDPS), 2021 IEEE International. :162-171 May, 2021
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Performance evaluation
Distributed processing
System performance
Prototypes
Reliability
Ultra-Low Latency Flash-based SSDs
Garbage Collection
Data Deduplication
Reference Count
Data Placement
Language
ISSN
1530-2075
Abstract
With the advent of new flash-based memory technologies with ultra-low latency, directly applying inline data deduplication in flash-based storage devices can degrade the system performance since key deduplication operations lie on the shortened critical write path of such devices. To address the problem, we propose a Content-Aware Garbage Collection scheme (CAGC), which embeds the data deduplication into the data movement workflow of the Garbage Collection (GC) process in ultra-low latency flash-based SSDs. By parallelizing the operations of valid data pages migration, hash computing and flash block erase, the deduplication-induced performance overhead is alleviated and redundant page writes during the GC period are eliminated. To further reduce data writes and write amplification during GC, CAGC separates and stores data pages in different regions based on their reference counts. The performance evaluation of our CAGC prototype implemented in FlashSim shows that CAGC significantly reduces the number of flash blocks erased and data pages migrated during GC, leading to improved user I/O performance and reliability of ultra-low latency flash-based SSDs.