학술논문

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

Document Type

Working Paper

Author

Zhou, Shengzhe; Lee, Zejian; Zhang, Shengyuan; Hou, Lefan; Yang, Changyuan; Yang, Guang; Yang, Zhiyuan; Sun, Lingyun

Source

Subject

Computer Science - Computer Vision and Pattern Recognition
Computer Science - Artificial Intelligence

Language

Abstract

Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models. Project link: \url{https://github.com/Sainzerjj/SFERD}.
Comment: AAAI 2024

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송