학술논문

CenterNet++ for Object Detection
Document Type
Periodical
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 46(5):3509-3521 May, 2024
Subject
Computing and Processing
Bioengineering
Proposals
Object detection
Shape
Feature extraction
Detectors
Real-time systems
Geometry
Anchor-free
bottom-up
deep learning
object detection
Language
ISSN
0162-8828
2160-9292
1939-3539
Abstract
There are two mainstream approaches for object detection: top-down and bottom-up. The state-of-the-art approaches are mainly top-down methods. In this paper, we demonstrate that bottom-up approaches show competitive performance compared with top-down approaches and have higher recall rates. Our approach, named CenterNet, detects each object as a triplet of keypoints (top-left and bottom-right corners and the center keypoint). We first group the corners according to some designed cues and confirm the object locations based on the center keypoints. The corner keypoints allow the approach to detect objects of various scales and shapes and the center keypoint reduces the confusion introduced by a large number of false-positive proposals. Our approach is an anchor-free detector because it does not need to define explicit anchor boxes. We adapt our approach to backbones with different structures, including ‘hourglass’-like networks and ‘pyramid’-like networks, which detect objects in single-resolution and multi-resolution feature maps, respectively. On the MS-COCO dataset, CenterNet with Res2Net-101 and Swin-Transformer achieve average precisions (APs) of 53.7% and 57.1%, respectively, outperforming all existing bottom-up detectors and achieving state-of-the-art performance. We also design a real-time CenterNet model, which achieves a good trade-off between accuracy and speed, with an AP of 43.6% at 30.5 frames per second (FPS).