학술논문

TolerantGAN: Text-Guided Image Manipulation Tolerant to Real-World Image
Document Type
Periodical
Source
IEEE Open Journal of Signal Processing IEEE Open J. Signal Process. Signal Processing, IEEE Open Journal of. 5:150-159 2024
Subject
Signal Processing and Analysis
Training
Manipulators
Crops
Hair
Codes
Training data
Signal processing
Text-guided image manipulation
generative adversarial network
manipulation direction
out-of-domain data
Language
ISSN
2644-1322
Abstract
Although text-guided image manipulation approaches have demonstrated highly accurate performance for editing the appearance of images in a virtual or simple scenario, their real-world applications face significant challenges. The primary cause of these challenges is the misalignment in the distribution of training and real-world data, which leads to unstable text-guided image manipulation. In this work, we propose a novel framework called TolerantGAN and tackle the new task of real-world text-guided image manipulation independent of the training data. To achieve this, we introduce two key concepts of a border smoothly connection module (BSCM) and a manipulation direction-based attention module (MDAM). BSCM smoothens the misalignment in the distribution of training and real-world data. MDAM extracts only regions highly relevant for image manipulation and assists in reconstructing unobserved objects in the training data. For in-the-wild input images of various classes, TolerantGAN robustly outperforms the state-of-the-art methods.