학술논문

An Empirical Study on Adaptive Inference for Pretrained Language Model
Document Type
Periodical
Source
IEEE Transactions on Neural Networks and Learning Systems IEEE Trans. Neural Netw. Learning Syst. Neural Networks and Learning Systems, IEEE Transactions on. 34(8):4321-4331 Aug, 2023
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
General Topics for Engineers
Adaptation models
Bit error rate
Task analysis
Inference mechanisms
Transformers
Computational modeling
Mathematical models
Adaptive inference
bidirectional encoder representations from transformers (BERT)
distillation
FastPLM
pretrained language model (PLM)
Language
ISSN
2162-237X
2162-2388
Abstract
Adaptive inference has been proven to improve bidirectional encoder representations from transformers (BERT)’s inference speed with minimal loss of accuracy. However, current work only focuses on the BERT model and lacks exploration of other pretrained language models (PLMs). Therefore, this article conducts an empirical study on the application of adaptive inference mechanism in various PLMs, including generative pretraining (GPT), GCNN, ALBERT, and TinyBERT. This mechanism is verified on both English and Chinese benchmarks, and experimental results demonstrated that it is able to speed up by a wide range from 1 to 10 times if given different speed thresholds. In addition, its application on ALBERT shows that adaptive inference can work with parameter sharing, achieving model compression and acceleration simultaneously, while the application on TinyBERT proves that it can further accelerate the distilled small model. As for the problem that too many labels make adaptive inference invalid, this article also proposes a solution, namely label reduction. Finally, this article open-sources an easy-to-use toolkit called FastPLM to help developers adopt pretrained models with adaptive inference capabilities in their applications.