학술논문

A Language Prior Based Focal Loss for Visual Question Answering

Document Type

Conference

Author

Lao, Mingrui; Guo, Yanming; Liu, Yu; Lew, Michael S.

Source

2021 IEEE International Conference on Multimedia and Expo (ICME) Multimedia and Expo (ICME), 2021 IEEE International Conference on. :1-6 Jul, 2021

Subject

Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Training
Deep learning
Visualization
Computational modeling
Refining
Predictive models
Knowledge discovery
Visual Question Answering
Language Priors
Focal Loss

Language

ISSN

1945-788X

Abstract

According to current research, one of the major challenges in Visual Question Answering (VQA) models is the overdependence on language priors (and neglect of the visual modality). VQA models tend to predict answers only based on superficial correlations between the first few words in question and frequency of related answer candidates. To address this issue, we propose a novel Language Prior based Focal Loss (LP-Focal Loss) by rescaling the standard cross entropy loss. Specifically, we employ a question-only branch to capture the language biases for each answer candidate based on the corresponding question input. Then, the LP-Focal Loss dynamically assigns lower weights to biased answers when computing the training loss, thereby reducing the contribution of more-biased instances in the train split. Extensive experiments show that the LP-Focal Loss can be generally applied to common baseline VQA models, and achieves significantly better performance on the VQA-CP v2 dataset, with an overall 18% accuracy boost over benchmark models.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송