학술논문

Training Multi-Bit Quantized and Binarized Networks with a Learnable Symmetric Quantizer
Document Type
Periodical
Source
IEEE Access Access, IEEE. 9:47194-47203 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Quantization (signal)
Training
Computational modeling
Neural networks
Memory management
Knowledge engineering
Learnable quantizer
quantization
binarization
model compression
machine learning
neuromorphic computing
Language
ISSN
2169-3536
Abstract
Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices, or cloud platforms for at-scale services. While binarization is a special case of quantization, this extreme case often leads to several training difficulties, and necessitates specialized models and training methods. As a result, recent quantization methods do not provide binarization, thus losing the most resource-efficient option, and quantized and binarized networks have been distinct research areas. We examine binarization difficulties in a quantization framework and find that all we need to enable the binary training are a symmetric quantizer, good initialization, and careful hyperparameter selection. These techniques also lead to substantial improvements in multi-bit quantization. We demonstrate our unified quantization framework, denoted as UniQ, on the ImageNet dataset with various architectures such as ResNet-18,-34 and MobileNetV2. For multi-bit quantization, UniQ outperforms existing methods to achieve the state-of-the-art accuracy. In binarization, the achieved accuracy is comparable to existing state-of-the-art methods even without modifying the original architectures.