학술논문

Quantization of Generative Adversarial Networks for Efficient Inference: A Methodological Study
Document Type
Conference
Source
2022 26th International Conference on Pattern Recognition (ICPR) Pattern Recognition (ICPR), 2022 26th International Conference on. :2179-2185 Aug, 2022
Subject
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Performance evaluation
Training
Quantization (signal)
Computational modeling
Semantics
Computer architecture
Neural network compression
Language
ISSN
2831-7475
Abstract
Generative adversarial networks (GANs) have an enormous potential impact on digital content creation, e.g., photorealistic digital avatars, semantic content editing, and quality enhancement of speech and images. However, the performance of modern GANs comes together with massive amounts of computations performed during the inference and high energy consumption. That complicates, or even makes impossible, their deployment on edge devices. The problem can be reduced with quantization—a neural network compression technique that facilitates hardware-friendly inference by replacing floating-point computations with low-bit integer ones. While quantization is well established for discriminative models, the performance of modern quantization techniques in application to GANs remains unclear. GANs generate content of a more complex structure than discriminative models, and thus quantization of GANs is significantly more challenging. To tackle this problem, we perform an extensive experimental study of state-of-art quantization techniques on three diverse GAN architectures, namely StyleGAN, Self-Attention GAN, and CycleGAN. As a result, we discovered practical recipes that allowed us to successfully quantize these models for inference with 4/8-bit weights and 8-bit activations while preserving the quality of the original full-precision models.