학술논문

Surpass Teacher: Enlightenment Structured Knowledge Distillation of Transformer
Document Type
Conference
Source
2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Systems, Man, and Cybernetics (SMC), 2023 IEEE International Conference on. :5102-5107 Oct, 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Training
Knowledge engineering
Uncertainty
Pedestrians
Transformers
Task analysis
Optimization
Language
ISSN
2577-1655
Abstract
It is difficult to train a trustworthy transformer model on a small image classification dataset. This research proposes a sophisticated structured knowledge distillation algorithm that uses CNNs as Transformer's sophisticated teachers, significantly lowering the number of training datasets needed. To better to develop the potential for CNN tutors, this research configures a public data set for CNN teaching as an enlightenment textbook to guide Transformer's training and avoid falling into local optimization prematurely. The distillation process then employs a “learn-digest-self-distillation” learning strategy to enable the Transformer to assimilate CNN knowledge in a structured manner. Sufficient experiments show that the proposed method is significantly better than the direct training Transformer under the condition of limited data sets. Moreover, in order to show the practical application value, this research contributed a practical data set for the classification of smoking and calling. The corresponding code and dataset will be released at https://gitee.com/wustdch/surpass-teacher if this paper is accepted.