학술논문

Automatic Sparse Connectivity Learning for Neural Networks
Document Type
Periodical
Source
IEEE Transactions on Neural Networks and Learning Systems IEEE Trans. Neural Netw. Learning Syst. Neural Networks and Learning Systems, IEEE Transactions on. 34(10):7350-7364 Oct, 2023
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
General Topics for Engineers
Neural networks
Training
Hardware
Logic gates
Learning systems
Gaussian distribution
Computational modeling
Model compression
model pruning
neural networks
sparse connectivity learning (SCL)
trainable binary mask
Language
ISSN
2162-237X
2162-2388
Abstract
Since sparse neural networks usually contain many zero weights, these unnecessary network connections can potentially be eliminated without degrading network performance. Therefore, well-designed sparse neural networks have the potential to significantly reduce the number of floating-point operations (FLOPs) and computational resources. In this work, we propose a new automatic pruning method—sparse connectivity learning (SCL). Specifically, a weight is reparameterized as an elementwise multiplication of a trainable weight variable and a binary mask. Thus, network connectivity is fully described by the binary mask, which is modulated by a unit step function. We theoretically prove the fundamental principle of using a straight-through estimator (STE) for network pruning. This principle is that the proxy gradients of STE should be positive, ensuring that mask variables converge at their minima. After finding Leaky ReLU, Softplus, and identity STEs can satisfy this principle, we propose to adopt identity STE in SCL for discrete mask relaxation. We find that mask gradients of different features are very unbalanced; hence, we propose to normalize mask gradients of each feature to optimize mask variable training. In order to automatically train sparse masks, we include the total number of network connections as a regularization term in our objective function. As SCL does not require pruning criteria or hyperparameters defined by designers for network layers, the network is explored in a larger hypothesis space to achieve optimized sparse connectivity for the best performance. SCL overcomes the limitations of existing automatic pruning methods. Experimental results demonstrate that SCL can automatically learn and select important network connections for various baseline network structures. Deep learning models trained by SCL outperform the state-of-the-art human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.