학술논문

HAS-RL: A Hierarchical Approximate Scheme Optimized With Reinforcement Learning for NoC-Based NN Accelerators

Document Type

Periodical

Author

Li, S.; Zhou, S.; Xue, Y.; Fan, W.; Cheng, T.; Ji, J.; Dai, C.; Song, W.; Chen, Q.; Gao, C.; Li, L.; Fu, Y.

Source

IEEE Transactions on Circuits and Systems I: Regular Papers IEEE Trans. Circuits Syst. I Circuits and Systems I: Regular Papers, IEEE Transactions on. 71(4):1863-1875 Apr, 2024

Subject

Components, Circuits, Devices and Systems
Delays
Reinforcement learning
Heuristic algorithms
Approximation algorithms
Traffic control
Routing
Regulation
Offline reinforcement learning
neural network
approximate communication
network-on-chip

Language

ISSN

1549-8328
1558-0806

Abstract

Network-on-Chip (NoC) is a scalable on-chip communication architecture for the NN accelerator, but with the increase in the number of nodes, the communication delay becomes higher. Applications such as machine learning have a certain resilience to noisy/erroneous transmitted data. Therefore, approximate communication becomes a promising solution to improving performance by reducing traffic loads under the constraint of the acceptable maximum accuracy loss of neural networks. It is a key issue to balance the result quality and the communication delay for approximate NoC systems. The traditional approximate NoC only considers the node-to-node approximation-based dynamic traffic regulation. However, the dynamically changing traffic patterns across different nodes, different times, and different applications lead to a huge search space, which makes it hard to explore an optimal global approximation solution. In this paper, we propose a quality model for different neural networks, which presents the relationship between the quality loss and the data approximate rate. Then, a hierarchical approximate scheme optimized with reinforcement learning (HAS-RL) is proposed and we reduce the complexity of the HAS-RL by reducing the state space and action space, which will reduce the resource overhead as well. After that, we embed a global approximate controller in the NoC system, in which we deploy a policy network trained with the offline reinforcement learning algorithm to adjust the data approximate rates of each node at run time. Compared with the state-of-the-art method, the proposed scheme reduces the average network delay by 13.5% while their accuracies are similar. The proposed HAS-RL only causes an additional area overhead of 1.24% and power consumption of 0.77% compared with the traditional router design.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송