학술논문

HASP: Hierarchical Asynchronous Parallelism for Multi-NN Tasks

Document Type

Periodical

Author

Li, H.; Ma, S.; Wang, T.; Zhang, W.; Wang, G.; Song, C.; Qu, H.; Lin, J.; Ma, C.; Pei, J.; Zhao, R.

Source

IEEE Transactions on Computers IEEE Trans. Comput. Computers, IEEE Transactions on. 73(2):366-379 Feb, 2024

Subject

Computing and Processing
Task analysis
Artificial neural networks
Computational modeling
Multicore processing
Hardware
Computer architecture
Synchronization
Multi-NN
muticore architecture
AI accelerator

Language

ISSN

0018-9340
1557-9956
2326-3814

Abstract

The rapid development of deep learning has propelled many real-world artificial intelligence applications. Many of these applications integrate multiple neural networks (multi-NN) to cater to various functionalities. There are two challenges of multi-NN acceleration: (1) competition for shared resources becomes a bottleneck, and (2) heterogeneous workloads exhibit remarkably different computing-memory characteristics and various synchronization requirements. Therefore, resource isolation and fine-grained resource allocation for each task are two fundamental requirements for multi-NN computing systems. Although a number of multi-NN acceleration technologies have been explored, few can completely fulfill both of these requirements, especially for mobile scenarios. This paper reports a Hierarchical Asynchronous Parallel Model (HASP) to enhance multi-NN performance to meet both requirements. HASP can be implemented on a multicore processor that adopts Multiple Instruction Multiple Data (MIMD) or Single Instruction Multiple Thread (SIMT) architectures, with minor adaptive modification needed. Further, a prototype chip is developed to validate the hardware effectiveness of this design. A corresponding mapping strategy is also developed, allowing the proposed architecture to simultaneously promote resource utilization and throughput. With the same workload, the prototype chip demonstrates 3.62$\boldsymbol{\times}$×, and 3.51$\boldsymbol{\times}$× higher throughput over Planaria and 8.68$\boldsymbol{\times}$×, 2.61$\boldsymbol{\times}$× over Jetson AGX Orin for MobileNet-V1 and ResNet50, respectively.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송