학술논문

PKU-AIGI-500K: A Neural Compression Benchmark and Model for AI-Generated Images
Document Type
Periodical
Source
IEEE Journal on Emerging and Selected Topics in Circuits and Systems IEEE J. Emerg. Sel. Topics Circuits Syst. Emerging and Selected Topics in Circuits and Systems, IEEE Journal on. 14(2):172-184 Jun, 2024
Subject
Components, Circuits, Devices and Systems
Image coding
Codecs
Circuits and systems
Measurement
Image synthesis
Image quality
Integrated circuit modeling
Image compression
AIGI
image feature
text-to-image alignment
subjective evaluation
Language
ISSN
2156-3357
2156-3365
Abstract
In recent years, artificial intelligence-generated content (AIGC) enabled by foundation models has received increasing attention and is undergoing remarkable development. Text prompts can be elegantly translated/converted into high-quality, photo-realistic images. This remarkable feature, however, has introduced extremely high bandwidth requirements for compressing and transmitting the vast number of AI-generated images (AIGI) for such AIGC services. Despite this challenge, research on compression methods for AIGI is conspicuously lacking but undeniably necessary. This research addresses this critical gap by introducing the pioneering AIGI dataset, PKU-AIGI-500K, encompassing over 105k+ diverse prompts and 528k+ images derived from five major foundation models. Through this dataset, we delve into exploring and analyzing the essential characteristics of AIGC images and empirically prove that existing data-driven lossy compression methods achieve sub-optimal or less efficient rate-distortion performance without fine-tuning, primarily due to a domain shift between AIGIs and natural images. We comprehensively benchmark the rate-distortion performance and runtime complexity analysis of conventional and learned image coding solutions that are openly available, uncovering new insights for emerging studies in AIGI compression. Moreover, to harness the full potential of redundant information in AIGI and its corresponding text, we propose an AIGI compression model (Cross-Attention Transformer Codec, CATC) trained on this dataset as a strong baseline. Subsequent experimental results demonstrate that our proposed model achieves up to 30.09% bitrate reduction compared to the state-of-the-art (SOTA) H.266/VVC codec and outperforms the SOTA learned codec, paving the way for future research in AIGI compression.