학술논문

GraFT: Gradual Fusion Transformer for Multimodal Re-Identification

Document Type

Working Paper

Author

Yin, Haoli; Li, Jiayao; Schiller, Eva; McDermott, Luke; Cummings, Daniel

Source

Subject

Computer Science - Computer Vision and Pattern Recognition

Language

Abstract

Object Re-Identification (ReID) is pivotal in computer vision, witnessing an escalating demand for adept multimodal representation learning. Current models, although promising, reveal scalability limitations with increasing modalities as they rely heavily on late fusion, which postpones the integration of specific modality insights. Addressing this, we introduce the \textbf{Gradual Fusion Transformer (GraFT)} for multimodal ReID. At its core, GraFT employs learnable fusion tokens that guide self-attention across encoders, adeptly capturing both modality-specific and object-specific features. Further bolstering its efficacy, we introduce a novel training paradigm combined with an augmented triplet loss, optimizing the ReID feature embedding space. We demonstrate these enhancements through extensive ablation studies and show that GraFT consistently surpasses established multimodal ReID benchmarks. Additionally, aiming for deployment versatility, we've integrated neural network pruning into GraFT, offering a balance between model size and performance.
Comment: 3 Borderline Reviews at WACV, 8 pages, 5 figures, 8 tables

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송