학술논문
Hybrid Transformers With Attention-Guided Spatial Embeddings for Makeup Transfer and Removal
Document Type
Periodical
Source
IEEE Transactions on Circuits and Systems for Video Technology IEEE Trans. Circuits Syst. Video Technol. Circuits and Systems for Video Technology, IEEE Transactions on. 34(4):2876-2890 Apr, 2024
Subject
Language
ISSN
1051-8215
1558-2205
1558-2205
Abstract
Existing makeup transfer methods typically transfer simple makeup colors in a well-conditioned face image and fail to handle makeup style details (e.g., complicated colors and shapes) and facial occlusion. To address these problems, this paper proposes Hybrid Transformers with Attention-guided Spatial Embeddings (named HT-ASE) for makeup transfer and removal. Specifically, a makeup context extractor adopts makeup context global-local interactions to aggregate the high-level context and low-level detail features of the makeup styles, which obtains the context-aware makeup features that encode the complicated colors and shapes of the makeup styles. A face identity extractor adopts a face identity local interaction to aggregate the identity-relevant features of shallow layers into identity semantic features, which refines the identity features. A spatially similarity-aware fusion network introduces a spatially-adaptive layer-instance normalization with attention-guided spatial embeddings to perform semantic alignment and fusion between the makeup and identity features, yielding precise and robust transfer results even with large spatial misalignment and facial occlusion. Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art methods, especially in the preservation of makeup style details and handling facial occlusion.