학술논문

A Methodology for Selective Protection of Matrix Multiplications: A Diagnostic Coverage and Performance Trade-off for CNNs Executed on GPUs
Document Type
Conference
Source
2022 6th International Conference on System Reliability and Safety (ICSRS) System Reliability and Safety (ICSRS), 2022 6th International Conference on. :9-18 Nov, 2022
Subject
Power, Energy and Industry Applications
Sensitivity analysis
Autonomous systems
Object detection
Detectors
Parallel processing
Real-time systems
Safety
GPU
CNN
protection
matrix multiplication
fault detection
Language
Abstract
The ability of CNNs to efficiently and accurately perform complex functions, such as object detection, has fostered their adoption in safety-related autonomous systems. These algorithms require high computational performance platforms that exploit high levels of parallelism. The detection, control and mitigation of random errors in these underlying high computational platforms become a must according to functional safety standards. In this paper, we propose protecting, with a catalog of diagnostic techniques, the most computationally expensive operation of the CNNs, the matrix multiplication. However, this protection entails a performance penalty, and the complete CNN protection may be unaffordable for those systems operating with strict real-time constraints. This paper proposes a three-stage methodology to selectively protect CNN layers to achieve the required diagnostic coverage and performance trade-off: i) sensitivity analysis to misclassification per CNN layers using a statistical fault injection campaign, ii) layer-by- layer performance impact and diagnostic coverage analysis, and iii) selective layer protection. Furthermore, we propose a strategy to effectively compute the achievable diagnostic coverage of large matrices implemented on GPUs. Finally, we apply the proposed methodology and strategy in Tiny YOLO-v3, an object detector based on CNNs.