학술논문

Structure and Content-Guided Video Synthesis with Diffusion Models

Document Type

Conference

Author

Esser, Patrick; Chiu, Johnathan; Atighehchian, Parmida; Granskog, Jonathan; Germanidis, Anastasis

Source

2023 IEEE/CVF International Conference on Computer Vision (ICCV) ICCV Computer Vision (ICCV), 2023 IEEE/CVF International Conference on. :7312-7322 Oct, 2023

Subject

Computing and Processing
Signal Processing and Analysis
Training
Computer vision
Computational modeling
Natural languages

Language

ISSN

2380-7504

Abstract

Text-guided generative diffusion models unlock powerful image creation and editing tools. Recent approaches that edit the content of footage while retaining structure require expensive re-training for every input or rely on error-prone propagation of image edits across frames.In this work, we present a structure and content-guided video diffusion model that edits videos based on descriptions of the desired output. Conflicts between user-provided content edits and structure representations occur due to insufficient disentanglement between the two aspects. As a solution, we show that training on monocular depth estimates with varying levels of detail provides control over structure and content fidelity. A novel guidance method, enabled by joint video and image training, exposes explicit control over temporal consistency. Our experiments demonstrate a wide variety of successes; fine-grained control over output characteristics, customization based on a few reference images, and a strong user preference towards results by our model.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송