학술논문

Layer-Specific Knowledge Distillation for Class Incremental Semantic Segmentation
Document Type
Periodical
Source
IEEE Transactions on Image Processing IEEE Trans. on Image Process. Image Processing, IEEE Transactions on. 33:1977-1989 2024
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Semantics
Semantic segmentation
Task analysis
Predictive models
Optimization methods
Iterative methods
Heuristic algorithms
Knowledge distillation
incremental learning
semantic segmentation
Language
ISSN
1057-7149
1941-0042
Abstract
Recently, class incremental semantic segmentation (CISS) towards the practical open-world setting has attracted increasing research interest, which is mainly challenged by the well-known issue of catastrophic forgetting. Particularly, knowledge distillation (KD) techniques have been widely studied to alleviate catastrophic forgetting. Despite the promising performance, existing KD-based methods generally use the same distillation schemes for different intermediate layers to transfer old knowledge, while employing manually tuned and fixed trade-off weights to control the effect of KD. These KD-based methods take no consideration of feature characteristics from different intermediate layers, limiting the effectiveness of KD for CISS. In this paper, we propose a layer-specific knowledge distillation (LSKD) method to assign appropriate knowledge schemes and weights for various intermediate layers by considering feature characteristics, aiming to further explore the potential of KD in improving the performance of CISS. Specifically, we present a mask-guided distillation (MD) to alleviate the background shift on semantic features, which performs distillation by masking the features affected by the background. Furthermore, a mask-guided context distillation (MCD) is presented to explore global context information lying in high-level semantic features. Based on them, our LSKD assigns different distillation schemes according to feature characteristics. To adjust the effect of layer-specific distillation adaptively, LSKD introduces a regularized gradient equilibrium method to learn dynamic trade-off weights. Additionally, our LSKD makes an attempt to simultaneously learn distillation schemes and trade-off weights of different layers by developing a bi-level optimization method. Extensive experiments on widely used Pascal VOC 12 and ADE20K show our LSKD clearly outperforms its counterparts while achieving state-of-the-art results.