학술논문

SFSANet: Multiscale Object Detection in Remote Sensing Image Based on Semantic Fusion and Scale Adaptability
Document Type
Periodical
Source
IEEE Transactions on Geoscience and Remote Sensing IEEE Trans. Geosci. Remote Sensing Geoscience and Remote Sensing, IEEE Transactions on. 62:1-10 2024
Subject
Geoscience
Signal Processing and Analysis
Semantics
Feature extraction
Object detection
Remote sensing
Visualization
Classification algorithms
Location awareness
receptive field
remote sensing image
semantic fusion (SF)
Language
ISSN
0196-2892
1558-0644
Abstract
In the field of computer vision, remote sensing image object detection plays an important role. Although the object detection algorithm has made significant progress, there are still problems in detecting objects with multiscale in remote sensing image. Due to the insufficient utilization of object feature information, the detection accuracy of multiscale objects is very low. To address the aforementioned issues, this article proposes an effective object detection algorithm for remote sensing image based on semantic fusion and scale adaptability (SFSANet). First, in view of the problem that the existing methods ignore the semantic differences between different scale feature maps, the semantic fusion (SF) module is proposed to enrich the semantic information and improve the ability to classify and locate objects. Next, to address the issue of the objects being easily interfered in complex background and the detection performance is poor, the spatial location attention (SLA) module is constructed to suppress background information and make key objects more prominent. Additionally, the scale adaptability (SA) module is designed to enrich the expression of feature information, realize the integration of global and local information, and ensure the integrity of image structure. Finally, we adopt the SIoU loss function as the localization loss to expedite model convergence. In order to verify the effectiveness of the proposed method, we conduct experiments on the mainstream datasets DIOR and NWPU VHR-10, which fully demonstrate the superiority of the proposed method.