학술논문

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation

Document Type

Conference

Author

Lu, Zeyu; Wu, Chengyue; Chen, Xinyuan; Wang, Yaohui; Bai, Lei; Qiao, Yu; Liu, Xihui

Source

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) WACV Applications of Computer Vision (WACV), 2024 IEEE/CVF Winter Conference on. :5362-5371 Jan, 2024

Subject

Computing and Processing
Interpolation
Visualization
Computer vision
Codes
Image synthesis
Semantics
Aerospace electronics
Algorithms
Generative models for image
video
3D
etc.
Applications
Arts / games / social media

Language

ISSN

2642-9381

Abstract

Diffusion models have attained impressive visual quality for image synthesis. However, how to probe and manipulate the latent space of diffusion models has not been extensively explored. Prior work diffusion autoencoders encode the semantic representations with a single latent code, neglecting the low-level details and leading to entangled representations. To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploits the coarse-to-fine feature hierarchy for the latent space of diffusion models. Our HDAE converges 2+ times faster and encodes richer and more comprehensive coarse-to-fine representations of images. The hierarchical latent space inherently disentangles different semantic levels of features. Furthermore, we propose a truncated feature based approach for disentangled image manipulation. We demonstrate the effectiveness of our proposed HDAE with extensive experiments and applications on image reconstruction, style mixing, controllable interpolation, image editing, and multi-modal semantic image synthesis. The code will be released upon acceptance.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송