학술논문

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Document Type

Periodical

Source

IEEE Micro Micro, IEEE. 38(2):8-20 Apr, 2018

Subject

Computing and Processing
Field programmable gate arrays
Data centers
Brain modeling
Real-time systems
System-on-chip
Neural networks
Deep learning
FPGA
Inference
Quantization
Hardware

Language

ISSN

0272-1732
1937-4143

Abstract

To meet the computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance. Project Brainwave, Microsofts principal infrastructure for AI serving in real time, accelerates deep neural network (DNN) inferencing in major services such as Bings intelligent search features and Azure. Exploiting distributed model parallelism and pinning over low-latency hardware microservices, Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes. A high-performance, precision-adaptable FPGA soft processor is at the heart of the system, achieving up to 39.5 teraflops (Tflops) of effective performance at Batch 1 on a state-of-the-art Intel Stratix 10 FPGA.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송