학술논문

Autotuning LSTM for Accelerated Execution on Edge
Document Type
Conference
Source
2021 IEEE International Symposium on Circuits and Systems (ISCAS). :1-5 May, 2021
Subject
Components, Circuits, Devices and Systems
Privacy
Runtime
Computational modeling
Search methods
Integrated circuit modeling
Artificial intelligence
Long short term memory
Deep Learning
Edge computing
ASR
NLP
Halide
Compiler
Language
ISSN
2158-1525
Abstract
Deployment of Deep Neural Networks (DNNs) on edge devices is highly desirable to address user privacy concerns and minimize the turnaround time of AI applications. However, the execution of DNN models on a battery-operated device requires a highly optimized implementation specific to the target hardware. Moreover, as different layers of a DNN exhibit distinct computation and memory characteristics, it is imperative to optimize each layer separately. This is in contrast to the widely deployed library-based approach where all the configurations of DNN operations share the same implementation. In this paper, we address this issue by auto-tuning the implementation of Long Short Term Memory (LSTM) operations which are widely used in sequence based AI applications. To exhaustively search through the space of optimizations and its parameters, we develop a high-level autotuning framework based on Halide. We use grid search to find the parameters that lead to minimum runtime and further present TPE based search method to find the near-optimal runtime in a limited number of trials. We observe 2.2× —3.1× speedup in execution time for LSTM layers used in widely deployed GNMT and DeepSpeech2 models.