학술논문

Improved Adversarial Robustness by Hardened Prediction

Document Type

Conference

Author

Source

2022 IEEE International Symposium on Information Theory (ISIT) Information Theory (ISIT), 2022 IEEE International Symposium on. :2952-2956 Jun, 2022

Subject

Communication, Networking and Broadcast Technologies
Training
Computational modeling
Neurons
Predictive models
Robustness
Biological neural networks
Information theory

Language

ISSN

2157-8117

Abstract

We find a way to harden the decision of a neural network. Combining such a hardening effect with another adversarial training method would further improve its adversarial robustness. By suppressing the logit corresponding to the class that the model has highest confidence during training, the model is encouraged to make harder predictions. This significantly improves a model’s robustness against gradient-based adversarial attacks. The simplicity of our method makes it very easy to be deployed on existing adversarial training schemes with almost no computational overhead. The experimental results show that a model trained with TRADES benefits from hardening. It shows a greatly improved robustness against the PGD attack while retaining similar performance against decision-based attacks. How the hardening effect effectively defends the models from gradient-based attacks is worth further investigation.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송