학술논문

Improved Adversarial Robustness by Hardened Prediction
Document Type
Conference
Source
2022 IEEE International Symposium on Information Theory (ISIT) Information Theory (ISIT), 2022 IEEE International Symposium on. :2952-2956 Jun, 2022
Subject
Communication, Networking and Broadcast Technologies
Training
Computational modeling
Neurons
Predictive models
Robustness
Biological neural networks
Information theory
Language
ISSN
2157-8117
Abstract
We find a way to harden the decision of a neural network. Combining such a hardening effect with another adversarial training method would further improve its adversarial robustness. By suppressing the logit corresponding to the class that the model has highest confidence during training, the model is encouraged to make harder predictions. This significantly improves a model’s robustness against gradient-based adversarial attacks. The simplicity of our method makes it very easy to be deployed on existing adversarial training schemes with almost no computational overhead. The experimental results show that a model trained with TRADES benefits from hardening. It shows a greatly improved robustness against the PGD attack while retaining similar performance against decision-based attacks. How the hardening effect effectively defends the models from gradient-based attacks is worth further investigation.