학술논문

Post-Distillation via Neural Resuscitation
Document Type
Periodical
Source
IEEE Transactions on Multimedia IEEE Trans. Multimedia Multimedia, IEEE Transactions on. 26:3046-3060 2024
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Neurons
Computational modeling
Standards
Optimization
Knowledge engineering
Task analysis
Probabilistic logic
Deep learning
knowledge distillation
model regularization
transfer learning
Language
ISSN
1520-9210
1941-0077
Abstract
Knowledge distillation, a widely adopted model compression technique, distils knowledge from a large teacher model to a smaller student model, with the goal of reducing the computational resources required for the student model. However, most existing distillation approaches focus on the types of knowledge and how to distil them, which neglect the student model's neuronal responses to the knowledge. In this article, we demonstrate that the kullback-leibler loss inhibits the neuronal responses in the opposite gradient direction, which injures the student model's potential during distilling. To address this problem, we introduce a principled dual-stage distillation scheme to rejuvenate all inhibited neurons at the neuronal level. In the first stage, we detect all the neurons in the student model during the standard distillation period and divide them into two parts according to their responses. In the second stage, we propose three strategies to resuscitate the neurons differently, which allows us to exploit the full potential of the student model. Through the experiments in various aspects of knowledge distillation, it is verified that the proposed approach outperforms the current state-of-the-art approaches. Our work provides a neuronal perspective for studying the response of the student model to the knowledge from the teacher model.