학술논문
Surpass Teacher: Enlightenment Structured Knowledge Distillation of Transformer
Document Type
Conference
Author
Source
2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Systems, Man, and Cybernetics (SMC), 2023 IEEE International Conference on. :5102-5107 Oct, 2023
Subject
Language
ISSN
2577-1655
Abstract
It is difficult to train a trustworthy transformer model on a small image classification dataset. This research proposes a sophisticated structured knowledge distillation algorithm that uses CNNs as Transformer's sophisticated teachers, significantly lowering the number of training datasets needed. To better to develop the potential for CNN tutors, this research configures a public data set for CNN teaching as an enlightenment textbook to guide Transformer's training and avoid falling into local optimization prematurely. The distillation process then employs a “learn-digest-self-distillation” learning strategy to enable the Transformer to assimilate CNN knowledge in a structured manner. Sufficient experiments show that the proposed method is significantly better than the direct training Transformer under the condition of limited data sets. Moreover, in order to show the practical application value, this research contributed a practical data set for the classification of smoking and calling. The corresponding code and dataset will be released at https://gitee.com/wustdch/surpass-teacher if this paper is accepted.