학술논문

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

Document Type

Periodical

Author

Lyu, X.; Gao, L.; Zeng, P.; Shen, H.T.; Song, J.

Source

IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 45(11):13921-13940 Nov, 2023

Subject

Computing and Processing
Bioengineering
Adaptation models
Correlation
Transformers
Tail
Task analysis
Head
Visualization
Scene graph generation
visual relationship
fine-grained learning
adaptive learning

Language

ISSN

0162-8828
2160-9292
1939-3539

Abstract

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e.g., “woman-on/standing on/walking on-beach”. As general SGG models tend to predict head predicates and re-balancing strategies prefer tail categories, none of them can appropriately handle hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating hard-to-distinguish objects, we propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. First, we introduce an Adaptive Predicate Lattice (PL-A) to figure out hard-to-distinguish predicates, which adaptively explores predicate correlations in keeping with model's dynamic learning pace. Practically, PL-A is initialized from SGG dataset, and gets refined by exploring model's predictions of current mini-batch. Utilizing PL-A, we propose an Adaptive Category Discriminating Loss (CDL-A) and an Adaptive Entity Discriminating Loss (EDL-A) , which progressively regularize model's discriminating process with fine-grained supervision concerning model's dynamic learning status, ensuring balanced and efficient learning process. Extensive experimental results show that our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100 , achieving new state-of-the-art performance. Moreover, experiments on Sentence-to-Graph Retrieval and Image Captioning tasks further demonstrate practicability of our method.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송