학술논문

Automatic Sparse Connectivity Learning for Neural Networks

Document Type

Periodical

Author

Tang, Z.; Luo, L.; Xie, B.; Zhu, Y.; Zhao, R.; Bi, L.; Lu, C.

Source

IEEE Transactions on Neural Networks and Learning Systems IEEE Trans. Neural Netw. Learning Syst. Neural Networks and Learning Systems, IEEE Transactions on. 34(10):7350-7364 Oct, 2023

Subject

Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
General Topics for Engineers
Neural networks
Training
Hardware
Logic gates
Learning systems
Gaussian distribution
Computational modeling
Model compression
model pruning
neural networks
sparse connectivity learning (SCL)
trainable binary mask

Language

ISSN

2162-237X
2162-2388

Abstract

Since sparse neural networks usually contain many zero weights, these unnecessary network connections can potentially be eliminated without degrading network performance. Therefore, well-designed sparse neural networks have the potential to significantly reduce the number of floating-point operations (FLOPs) and computational resources. In this work, we propose a new automatic pruning method—sparse connectivity learning (SCL). Specifically, a weight is reparameterized as an elementwise multiplication of a trainable weight variable and a binary mask. Thus, network connectivity is fully described by the binary mask, which is modulated by a unit step function. We theoretically prove the fundamental principle of using a straight-through estimator (STE) for network pruning. This principle is that the proxy gradients of STE should be positive, ensuring that mask variables converge at their minima. After finding Leaky ReLU, Softplus, and identity STEs can satisfy this principle, we propose to adopt identity STE in SCL for discrete mask relaxation. We find that mask gradients of different features are very unbalanced; hence, we propose to normalize mask gradients of each feature to optimize mask variable training. In order to automatically train sparse masks, we include the total number of network connections as a regularization term in our objective function. As SCL does not require pruning criteria or hyperparameters defined by designers for network layers, the network is explored in a larger hypothesis space to achieve optimized sparse connectivity for the best performance. SCL overcomes the limitations of existing automatic pruning methods. Experimental results demonstrate that SCL can automatically learn and select important network connections for various baseline network structures. Deep learning models trained by SCL outperform the state-of-the-art human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송