학술논문

Variational Distribution Learning for Unsupervised Text-to-Image Generation

Document Type

Conference

Author

Kang, Minsoo; Lee, Doyup; Kim, Jiseob; Kim, Saehoon; Han, Bohyung

Source

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) CVPR Computer Vision and Pattern Recognition (CVPR), 2023 IEEE/CVF Conference on. :23380-23389 Jun, 2023

Subject

Computing and Processing
Training
Computer vision
Image recognition
Text recognition
Computational modeling
Artificial neural networks
Semisupervised learning
Image and video synthesis and generation

Language

ISSN

2575-7075

Abstract

We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. In this work, instead of simply generating pseudo-ground-truth sentences of training images using existing image captioning methods, we employ a pretrained CLIP model, which is capable of properly aligning embeddings of images and corresponding texts in a joint space and, consequently, works well on zero-shot recognition tasks. We optimize a text-to-image generation model by maximizing the data log-likelihood conditioned on pairs of image-text CLIP embeddings. To better align data in the two domains, we employ a principled way based on a variational inference, which efficiently estimates an approximate posterior of the hidden text embedding given an image and its CLIP feature. Experimental results validate that the proposed framework outperforms existing approaches by large margins under unsupervised and semi-supervised text-to-image generation settings.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송