학술논문

DGFNet: Depth-Guided Cross-Modality Fusion Network for RGB-D Salient Object Detection
Document Type
Periodical
Source
IEEE Transactions on Multimedia IEEE Trans. Multimedia Multimedia, IEEE Transactions on. 26:2648-2658 2024
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Feature extraction
Object detection
Fuses
Task analysis
Semantics
Data mining
Visualization
RGB-D salient object detection
depth quality
cross-modal feature fusion
Language
ISSN
1520-9210
1941-0077
Abstract
RGB-D salient object detection (SOD) focuses on utilizing the complementary cues of RGB and depth modalities to detect and segment salient regions. However, many proposed methods train their models in a simple multi-modal manner, ignoring the differences between these two modalities in the contribution of salient detection. Furthermore, the quality of depth datasets varies significantly between individuals and is another important factor affecting model performance. To address the aforementioned issues, this article proposes a novel depth-guided fusion network framework (DGFNet) for the RGB-D SOD task. To avoid the influence of low-quality depth maps on RGB-D SOD, we design a depth map enhanced algorithm which jointly models salient detection and depth estimation to improve the quality of depth. Also, we propose a depth attention mechanism to encode valuable spatial information for SOD, which is then used in depth-guided fusion (DGF) module to guide the fusion of cross-modality features at each level. Extensive experiments on seven commonly tested datasets demonstrate that our DGFNet outperforms the 23 state-of-the-art RGB-D-based SOD methods.