학술논문

Action Semantic Alignment for Image Captioning

Document Type

Conference

Author

Huo, Da; Kastner, Marc A.; Komamizu, Takahiro; Ide, Ichiro

Source

2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR) MIPR Multimedia Information Processing and Retrieval (MIPR), 2022 IEEE 5th International Conference on. :194-197 Aug, 2022

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
Geoscience
Robotics and Control Systems
Signal Processing and Analysis
Training
Measurement
Semantics
Information processing
Task analysis

Language

ISSN

2770-4319

Abstract

Image captioning is one of the main goals in vision and language processing, which aims to generate proper descriptions of images. Recently, the attention mechanisms became crucial in captioning tasks, as they can capture global dependencies between modalities. Moreover, some works have used objects detected from the input image as anchor points, so called object tags, to ease such alignments resulting in good performance for this task. In this paper, we newly introduce action information as a prior to further improve this, by adding action tags for training. The action tags can learn alignment at action semantic level and catch the previously ignored dimension of action, that could be very important in image captioning. We found that training with action tags can be used to describe images in a dynamic style. Furthermore, we found it can actually lead to a significant improvement compared with other methods in captioning performance measured by common metrics.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송