학술논문

High performance deep neural network on low cost mobile GPU

Document Type

Conference

Author

Tsung, Pei-Kuei; Tsai, Sung-Fang; Pai, Alex; Lai, Shu-Jen; Lu, Chienping

Source

2016 IEEE International Conference on Consumer Electronics (ICCE) Consumer Electronics (ICCE), 2016 IEEE International Conference on. :69-70 Jan, 2016

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Photonics and Electrooptics
Power, Energy and Industry Applications
Transportation
Graphics processing units
Kernel
Mobile communication
Convolution
Optimization
System-on-chip
Acceleration

Language

ISSN

2158-4001

Abstract

In recent years, machine learning based on deep neural networks (DNN) is playing an increasingly important role. Artificial intelligence applications using DNN are achieving higher and higher accuracy levels. However, the multi-layer characteristic of a DNN makes for huge computational complexity consumption requirements. In order to feasibly run DNN applications on mobile devices, an efficient DNN flow optimized for a mobile GPU is desired. In this paper, a mobile-GPU-accelerated DNN flow is proposed. By the proposed input buffer address remapping scheme, shader assembly code optimization and kernel merging between computing nodes, 10.6 FPS is achieved in a 35.2 GFLOPS mobile GPU with 94.9mJ per frame, which is a 58x speed up and a 104x more energy efficient compared to a pure mobile CPU solution. Compared with state-of-the-art GPU accelerator devices and libraries, the proposed scheme provides a 226%∼1000% higher computing efficiency.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송