학술논문

SF-YOLO: RGB-T Fusion Object Detection in UAV Scenes
Document Type
Conference
Source
2023 8th International Conference on Image, Vision and Computing (ICIVC) Image, Vision and Computing (ICIVC), 2023 8th International Conference on. :51-59 Jul, 2023
Subject
Computing and Processing
Fuses
Military computing
Computational modeling
Object detection
Reconnaissance
Feature extraction
Autonomous aerial vehicles
UAV
multimodal data fusion
object detection
shallow enhanced feature pyramid
Language
Abstract
In recent years, unmanned aerial vehicles (UAV) have been widely used in the field of object detection due to their high flexibility, strong maneuverability, and ability to carry multiple types of cameras. With the development of RGB-Thermal (RGB-T) perception technology and the increasing demand for military use, a network structure that fuses RGB and thermal images with low false alarm rate and high detection precision has become a research focus. When visible-light cameras are affected by lighting conditions (such as low light and foggy weather), thermal image information provides useful complementary information. However, how to effectively fuse RGB and thermal images is still a challenge. Previous works have involved some simple fusion strategies, such as merging them at the input or connecting multimodal data features inside the model, or applying attention to each data modality. These fusion strategies are direct but not sufficient. To address this issue, we propose a multimodal data fusion target detection network called SF-YOLO. We develop an attention interaction enhanced fusion module (AIEF) to fully utilize complementary information while reducing the impact of redundant information. In addition, to address the problem of weak feature extraction ability in the CNN network structure, we introduce the Swin-Transformer module, and for the problem of large-scale variations in target size in UAV aerial images, we design a shallow enhanced feature pyramid (SEFP). The SF-YOLO multimodal data object detection algorithm proposed in this paper is tested on public datasets and our own dataset through a series of experiments, and the results show the effectiveness of our proposed method. This architecture is expected to be applied in military reconnaissance and public safety fields.