학술논문

Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations

Document Type

Conference

Author

Source

2021 IEEE/CVF International Conference on Computer Vision (ICCV) ICCV Computer Vision (ICCV), 2021 IEEE/CVF International Conference on. :8452-8463 Oct, 2021

Subject

Computing and Processing
Training
Visualization
Computer vision
Image recognition
Annotations
Computational modeling
Genomics
Transfer/Low-shot/Semi/Unsupervised Learning
Action and behavior recognition
Recognition and classification

Language

ISSN

2380-7504

Abstract

We study the problem of multi-label zero-shot recognition in which labels are in the form of human-object interactions (combinations of actions on objects), each image may contain multiple interactions and some interactions do not have training images. We propose a novel compositional learning framework that decouples interaction labels into separate action and object scores that incorporate the spatial compatibility between the two components. We combine these scores to efficiently recognize seen and unseen interactions. However, learning action-object spatial relations, in principle, requires bounding-box annotations, which are costly to gather. Moreover, it is not clear how to generalize spatial relations to unseen interactions. We address these challenges by developing a cross-attention mechanism that localizes objects from action locations and vice versa by predicting displacements between them, referred to as relational directions. During training, we estimate the relational directions as ones maximizing the scores of ground-truth interactions that guide predictions toward compatible action-object regions. By extensive experiments, we show the effectiveness of our framework, where we improve the state of the art by 2.6% mAP score and 5.8% recall score on HICO and Visual Genome datasets, respectively. 1

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송