KOR

e-Article

ObjDedup: High-Throughput Object Storage Layer for Backup Systems With Block-Level Deduplication
Document Type
Periodical
Source
IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 34(7):2180-2197 Jul, 2023
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Metadata
Engines
Throughput
Object recognition
Cloud computing
Aerospace electronics
Quality of service
Backup storage
deduplication
object storage
secondary storage
Language
ISSN
1045-9219
1558-2183
2161-9883
Abstract
The immense popularity of object storage is also affecting the market of backup. Not only have novel backup solutions emerged that utilize cloud-based object storage as backends, but also support for object storage interfaces is increasingly expected from traditional dedicated backup appliances. This latter trend especially concerns systems with data deduplication, as they can offer compelling gains in storage capacity and throughput. However, such systems have been designed for interfaces and workloads that are markedly different from those encountered in object storage. Notably, they expect data to be written in portions that are orders of magnitude longer than those in the novel object-storage-oriented backup applications. In this light, we contribute twofold. First, contrasting the properties of object storage interfaces with usage patterns from 686 commercial deployments of backup appliances, we identify specific issues an implementation of such an interface has to address to offer adequate performance in a backup system with block-level deduplication. In particular, we show that a major challenge is efficient metadata management. Second, we present distributed data structures and algorithms to handle object metadata in backup systems with block-level deduplication. Subsequently, we implement them as an object storage layer for our HYDRAstor backup system. In comparison to object storage without in-line deduplication, our solution achieves 1.8–3.93x higher write throughput. Compared to object storage on top of a state-of-the-art file-based backup system, it processes 5.26–11.34x more object put operations per time unit.