학술논문

High performance deep neural network on low cost mobile GPU
Document Type
Conference
Source
2016 IEEE International Conference on Consumer Electronics (ICCE) Consumer Electronics (ICCE), 2016 IEEE International Conference on. :69-70 Jan, 2016
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Transportation
Graphics processing units
Kernel
Mobile communication
Convolution
Optimization
System-on-chip
Acceleration
Language
ISSN
2158-4001
Abstract
In recent years, machine learning based on deep neural networks (DNN) is playing an increasingly important role. Artificial intelligence applications using DNN are achieving higher and higher accuracy levels. However, the multi-layer characteristic of a DNN makes for huge computational complexity consumption requirements. In order to feasibly run DNN applications on mobile devices, an efficient DNN flow optimized for a mobile GPU is desired. In this paper, a mobile-GPU-accelerated DNN flow is proposed. By the proposed input buffer address remapping scheme, shader assembly code optimization and kernel merging between computing nodes, 10.6 FPS is achieved in a 35.2 GFLOPS mobile GPU with 94.9mJ per frame, which is a 58x speed up and a 104x more energy efficient compared to a pure mobile CPU solution. Compared with state-of-the-art GPU accelerator devices and libraries, the proposed scheme provides a 226%∼1000% higher computing efficiency.