학술논문

NormSoftmax: Normalizing the Input of Softmax to Accelerate and Stabilize Training

Document Type

Conference

Author

Source

2023 IEEE International Conference on Omni-layer Intelligent Systems (COINS) Omni-layer Intelligent Systems (COINS), 2023 IEEE International Conference on. :1-6 Jul, 2023

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Training
Neural networks
Machine learning
Transformers
Robustness
Stability analysis
Probability distribution

Language

Abstract

Softmax is a basic function that normalizes a vector to a probability distribution and is widely used in machine learning, most notably in cross-entropy loss function and dot product attention operations. However, the optimization of softmax-based models is sensitive to the input statistics change. We observe that the input of softmax changes significantly during the initial training stage, causing slow and unstable convergence when training the model from scratch. To remedy the optimization difficulty of softmax, we propose a simple yet effective substitution, named NormSoftmax, where the input vector is first normalized to unit variance and then fed to the standard softmax function. Similar to other existing normalization layers in machine learning models, NormSoftmax can stabilize and accelerate the training process, and also increase the robustness of the training procedure against hyperparameters. Experiments on Transformer-based models and convolutional neural networks validate that our proposed NormSoftmax is an effective plug-and-play module to stabilize and speed up the optimization of neural networks with cross-entropy loss or dot-product attention operations.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송