학술논문

RGB-D Salient Object Detection Using Saliency and Edge Reverse Attention
Document Type
Periodical
Source
IEEE Access Access, IEEE. 11:68818-68825 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Image edge detection
Feature extraction
Object detection
Task analysis
Indexes
Saliency detection
Image resolution
Deep learning
RGB-D salient object detection
reverse attention
Language
ISSN
2169-3536
Abstract
RGB-D salient object detection is a task to detect visually significant objects in an image using RGB and depth images. Although many useful CNN-based methods have been proposed in the past, there are some problems such as blurring of object boundaries and inability to detect important parts of objects, leading to a decrease in detection accuracy. In this paper, we propose RGB-D salient object detection using Saliency and Edge Reverse Attention(SERA), which combines the fusion of saliency and edge features with reverse attention. The reverse attention process has the effect of making it easier to capture the boundaries of objects and undetected objects by inverting the up-sampled saliency features and weighting other saliency features, and the edge reverse attention process has the effect of making salient object regions stand out by inverting the edge features and weighting saliency features. The interaction between the salient object features and the edge features enhances each other’s features and refines the information on the boundary of objects and salient object regions. In addition, to make it easier to refer to the global information of an image, we introduced the Multi-Scale Interactive Module(MSIM), which is capable of acquiring information at rich scales by converting feature maps to different resolutions and interacting with them. In addition to the salient object output, supervised learning is applied to multiple edge outputs of each resolution to improve the accuracy of both salient objects and boundary areas. Experimental results on five benchmarks show that the proposed method quantitatively performs better than the conventional method, and qualitatively improves the sharpness of object boundaries and the accuracy of detecting important parts of objects.