학술논문

Consumer-Centric Insights Into Resilient Small Object Detection: SCIoU Loss and Recursive Transformer Network
Document Type
Periodical
Source
IEEE Transactions on Consumer Electronics IEEE Trans. Consumer Electron. Consumer Electronics, IEEE Transactions on. 70(1):2178-2187 Feb, 2024
Subject
Power, Energy and Industry Applications
Components, Circuits, Devices and Systems
Fields, Waves and Electromagnetics
Neck
Transformers
Head
Task analysis
Shape
Autonomous aerial vehicles
Detectors
Unmanned aerial vehicle (UAV) image
small object detection
you only look once
Bottleneck transformer
Language
ISSN
0098-3063
1558-4127
Abstract
As an emerging consumer electronic product, the use of unmanned aerial vehicle(UAV) for a variety of tasks has received growing attention and favor in the enterprise or individual consumer electronics market in recent years. The deep neural network based object detectors are convenient to embed into the UAV product, however, the drone-captured images could bring the potential challenges of object occlusion, large scale difference and complex background to these methods because they are not designed for the detection of small and tiny objects within the aerial images. To address the problem, we propose an improved YOLO paradigm called SR-YOLO with an Efficient Neck, Shape CIoU and Recursion Bottleneck Transformer for better object detection performance in consumer-level UAV products. Firstly, an efficient neck structure is presented to retain richer features through a small object detection layer and an up-sampling operator suitable for small object detection. Secondly, we design a new prediction box loss function called shape complete-IoU(SCIoU), which utilizes a width (height) limiting factor to alleviate the deficiency that the CIoU only focuses on aspect ratios by taking into account both the aspect ratio and the ratio of the two boxes’ widths. Moreover, combined with recurrent neural network and multi-head self-attention mechanism at the cyclic manner, a recursive bottleneck transformer is constructed to relieve the impact of highly dense scene and occlusion problems exists in UAV images. We conduct the extensive experiments on two public datasets of VisDrone2019 and TinyPerson, where the results show that the proposed model surpasses the compared YOLO by 8.1% and 3.2% in $mAP_{50}$ respectively. In addition, the analysis and case study also validate our SR-YOLO’s superiority and effectiveness.