학술논문

Defeating Misclassification Attacks Against Transfer Learning
Document Type
Periodical
Source
IEEE Transactions on Dependable and Secure Computing IEEE Trans. Dependable and Secure Comput. Dependable and Secure Computing, IEEE Transactions on. 20(2):886-901 Apr, 2023
Subject
Computing and Processing
Transfer learning
Task analysis
Mathematical models
Training
Computational modeling
Data models
Perturbation methods
Deep neural network
defence against adversarial examples
transfer learning
pre-trained model
Language
ISSN
1545-5971
1941-0018
2160-9209
Abstract
Transfer learning is prevalent as a technique to efficiently generate new models (Student models) based on the knowledge transferred from a pre-trained model (Teacher model). However, Teacher models are often publicly available for sharing and reuse, which inevitably introduces vulnerability to trigger severe attacks against transfer learning systems. In this article, we take a first step towards mitigating one of the most advanced misclassification attacks in transfer learning. We design a distilled differentiator via activation-based network pruning to enervate the attack transferability while retaining accuracy. We adopt an ensemble structure from variant differentiators to improve the defence robustness. To avoid the bloated ensemble size during inference, we propose a two-phase defence, in which inference from the Student model is first performed to narrow down the candidate differentiators to be assembled, and later only a small, fixed number of them can be chosen to validate clean or reject adversarial inputs effectively. Our comprehensive evaluations on both large and small image recognition tasks confirm that the Student models with our defence of only 5 differentiators are immune to over 90% of the adversarial inputs with an accuracy loss of less than 10%. Our comparison also demonstrates that our design outperforms prior problematic defences.