학술논문

The Design of a Lossless Deduplication Scheme to Eliminate Fine-Grained Redundancy for JPEG Image Storage Systems
Document Type
Periodical
Source
IEEE Transactions on Computers IEEE Trans. Comput. Computers, IEEE Transactions on. 73(5):1385-1399 May, 2024
Subject
Computing and Processing
Image coding
Transform coding
Discrete cosine transforms
Throughput
Quantization (signal)
Redundancy
Feature extraction
Image deduplication
fine-grained deduplication
delta compression
JPEG compression
storage systems
Language
ISSN
0018-9340
1557-9956
2326-3814
Abstract
Image data storage has grown explosively, so image deduplication is used to save storage by eliminating redundancy between different images. However, traditional image deduplication cannot eliminate fine-grained redundancy nor guarantee lossless results. In this work, we propose imDedup, a lossless and fine-grained deduplication scheme for JPEG image storage systems. Specifically, imDedup uses a novel sampling hash method, Feature Bitmap, to detect similar images in a fast way by utilizing the information distribution of JPEG data. Meanwhile, it uses Idelta, a novel delta encoder that incorporates image compression into deduplication, to guarantee the non-redundant data can be re-compressed via image encoding and thus improves the compression ratio. Besides, we propose the DCHash and Fixed-Point Matching (FPM) techniques to further speed up Idelta. We also propose imDedup-plus, which dynamically chooses the DCHash-based or FPM-based compressor to achieve higher throughputs without sacrificing the compression ratio. Experimental results demonstrate the superiority of the imDedup-based methods on five datasets. Compared with the state-of-the-art similarity detector and delta encoder, imDedup achieves 1.8–4.4$\boldsymbol{\times}$× higher throughputs and 1.3–1.7$\boldsymbol{\times}$× higher compression ratios, respectively. Besides, imDedup-plus can further achieve 1.3–2.9$\boldsymbol{\times}$× higher throughputs than imDedup without sacrificing the compression ratio.