학술논문

I/O Causality Based In-Line Data Deduplication for Non-Volatile Memory Enabled Storage Systems
Document Type
Periodical
Source
IEEE Transactions on Computers IEEE Trans. Comput. Computers, IEEE Transactions on. 73(5):1327-1340 May, 2024
Subject
Computing and Processing
Nonvolatile memory
Redundancy
Costs
Throughput
Indexing
Cause effect analysis
Random access memory
Data deduplication
I/O causality
non-volatile memory
Language
ISSN
0018-9340
1557-9956
2326-3814
Abstract
Data deduplication technologies are widely exploited to reduce capacity demands for storage. Previous chunk-based offline deduplication technologies often cause serious performance overhead due to data chunking and indexing. Particularly, they are not efficient for non-volatile memory (NVM) based storage systems because they cannot fully exploit the byte-addressability feature of NVMs for fine-grained deduplication. In this paper, we propose I/O Causality based In-line Deduplication (ICID) to maximize the deduplication ratio for NVM-based storage systems. Unlike previous inline deduplication schemes that use hash indexes to identify duplicate data slices, ICID records memory-copy operations in a B-tree structure to achieve causality-based inline deduplication. We propose two novel techniques to manage memory-copy records in the B-tree efficiently. First, to speed up the B-tree lookup, we group memory-copy records targeted to the same page in a B-tree node to improve data locality. Second, we exploit the spatial locality of memory accesses to identify outdated memory-copy records, and delete them in time to reduce memory consumption of the B-tree. We evaluate ICID in a system equipped with Intel Optane DC Persistent Memory Modules. For a typical KV store–LevelDB, our experimental results show that ICID achieves up to 16$\boldsymbol{\times}$× higher deduplication ratio and reduces the time cost of data deduplication by 47% on average compared with state-of-the-art deduplication schemes.