학술논문

Towards Practical Plug-and-Play Diffusion Models

Document Type

Conference

Author

Go, Hyojun; Lee, Yunsung; Kim, JinYoung; Lee, Seunghyun; Jeong, Myeongho; Lee, Hyun Seung; Choi, Seungtaek

Source

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) CVPR Computer Vision and Pattern Recognition (CVPR), 2023 IEEE/CVF Conference on. :1962-1971 Jun, 2023

Subject

Computing and Processing
Training
Image synthesis
Semantic segmentation
Process control
Diffusion processes
Data models
Pattern recognition
Image and video synthesis and generation

Language

ISSN

2575-7075

Abstract

Diffusion-based generative models have achieved remarkable success in image generation. Their guidance formulation allows an external model to plug-and-play control the generation process for various tasks without finetuning the diffusion model. However, the direct use of publicly available off-the-shelf models for guidance fails due to their poor performance on noisy inputs. For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single guidance model; (2) collecting labeled datasets hinders scaling up for various tasks. To tackle the limitations, we propose a novel strategy that leverages multiple experts where each expert is specialized in a particular noise range and guides the reverse process of the diffusion at its corresponding timesteps. However, as it is infeasible to manage multiple networks and utilize labeled data, we present a practical guidance framework termed Practical Plug-And-Play (PPAP), which leverages parameter-efficient fine-tuning and data-free knowledge transfer. We exhaustively conduct ImageNet class conditional generation experiments to show that our method can successfully guide diffusion with small trainable parameters and no labeled data. Finally, we show that image classifiers, depth estimators, and semantic segmentation models can guide publicly available GLIDE through our framework in a plug-and-play manner. Our code is available at https://github.com/riiid/PPAP.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송