학술논문

Quantization for Bayesian Deep Learning: Low-Precision Characterization and Robustness
Document Type
Conference
Source
2023 IEEE International Symposium on Workload Characterization (IISWC) IISWC Workload Characterization (IISWC), 2023 IEEE International Symposium on. :180-192 Oct, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Deep learning
Uniform resource locators
Quantization (signal)
Uncertainty
Codes
Computational modeling
Neural networks
Language
ISSN
2835-2238
Abstract
Bayesian Deep Learning is an emerging field for building robust and trustworthy AI systems due to its ability to estimate reliable uncertainty in neural networks. The need for modeling distribution over parameters and multiple Monte Carlo forward runs in Bayesian neural networks leads to larger model size and significant increase in inference latency compared to deterministic models, which poses challenges for practical deployment. Quantization is a technique that can reduce the model size and also speed up the inference through low-precision computation. In this work, we propose and evaluate a quantization framework and workflow for Bayesian deep learning workloads, which leverages 8-bit integer (INT8) operations to accelerate inference on the 4th Gen Intel Xeon scalable processor (formerly codenamed Sapphire Rapids). We demonstrate that our quantization workflow achieves 6.9x inference throughput speedup on the ImageNet benchmark without sacrificing the model accuracy and quality of uncertainty. Furthermore, we evaluate the effects of quantization on Bayesian neural networks w.r.t. generalizability, robustness against data drift, and its capability in uncertainty estimation on large-scale datasets including a real-world safety-critical application. Our code has been integrated into an open-source project and made available on GitHub at the following URL: https://github.com/IntelLabs/bayesian-torch.