학술논문

Incremental Object Detection With Image-Level Labels
Document Type
Periodical
Source
IEEE Transactions on Artificial Intelligence IEEE Trans. Artif. Intell. Artificial Intelligence, IEEE Transactions on. 5(5):2331-2341 May, 2024
Subject
Computing and Processing
Feature extraction
Object detection
Head
Detectors
Task analysis
Training
Annotations
Deep neural networks
incremental learning
object detection
Language
ISSN
2691-4581
Abstract
Incremental object detection (IOD) aims to achieve simultaneous prediction of old and new samples on localization and classification when new concepts are provided. It is a challenging task due to the need for a joint interpretation of semantic and spatial information. While the existing work accomplishes the detector generalization with the help of annotated sample classes and bounding boxes, this article presents one astonishing finding on the unnecessity of new bounding boxes, which will significantly reduce the annotation cost and condition constraints in IOD. To enable the incremental detection process with image-level labels, we propose a multibranch decoupling scheme in which the representation, classification, and regression branches are customized to accommodate new concepts with class labels with different semantic levels. The regression branch is first frozen to ensure generalizable localization while maintaining the stability of representation optimization. Then, the representation branch is gradient-inverted to estimate the distribution drift from the old object-level set to the new image-level one, improving the effectiveness of feature updates in the absence of homogeneous semantic and spatial supervision. Finally, the classification branch is calibrated to balance the contribution of different images to the gradient, in which the frequency ratios of old instances and new images are calculated as a postprocessing factor for the logits margin. Extensive experiments on two standard incremental detection benchmarks demonstrate the above-par recognition and localization performance while reducing the burden on new class annotation, outperforming state-of-the-art methods by 2% and 3%, respectively.