학술논문

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
Document Type
Conference
Source
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) WACV Applications of Computer Vision (WACV), 2024 IEEE/CVF Winter Conference on. :5362-5371 Jan, 2024
Subject
Computing and Processing
Interpolation
Visualization
Computer vision
Codes
Image synthesis
Semantics
Aerospace electronics
Algorithms
Generative models for image
video
3D
etc.
Applications
Arts / games / social media
Language
ISSN
2642-9381
Abstract
Diffusion models have attained impressive visual quality for image synthesis. However, how to probe and manipulate the latent space of diffusion models has not been extensively explored. Prior work diffusion autoencoders encode the semantic representations with a single latent code, neglecting the low-level details and leading to entangled representations. To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploits the coarse-to-fine feature hierarchy for the latent space of diffusion models. Our HDAE converges 2+ times faster and encodes richer and more comprehensive coarse-to-fine representations of images. The hierarchical latent space inherently disentangles different semantic levels of features. Furthermore, we propose a truncated feature based approach for disentangled image manipulation. We demonstrate the effectiveness of our proposed HDAE with extensive experiments and applications on image reconstruction, style mixing, controllable interpolation, image editing, and multi-modal semantic image synthesis. The code will be released upon acceptance.