학술논문

Crossing the Gap: Domain Generalization for Image Captioning

Document Type

Conference

Author

Ren, Yuchen; Mao, Zhendong; Fang, Shancheng; Lu, Yan; He, Tong; Du, Hao; Zhang, Yongdong; Ouyang, Wanli

Source

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) CVPR Computer Vision and Pattern Recognition (CVPR), 2023 IEEE/CVF Conference on. :2871-2880 Jun, 2023

Subject

Computing and Processing
Measurement
Training
Computer vision
Computational modeling
Semantics
Benchmark testing
Cognition
Vision
language
and reasoning

Language

ISSN

2575-7075

Abstract

Existing image captioning methods are under the assumption that the training and testing data are from the same domain or that the data from the target domain (i.e., the domain that testing data lie in) are accessible. However, this assumption is invalid in real-world applications where the data from the target domain is inaccessible. In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process. We first construct a benchmark dataset for DGIC, which helps us to investigate models' domain generalization (DG) ability on unseen domains. With the support of the new benchmark, we further propose a new framework called language-guided semantic metric learning (LSML) for the DGIC setting. Experiments on multiple datasets demonstrate the challenge of the task and the effectiveness of our newly proposed benchmark and LSML framework.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송