학술논문

Characterizing a Neutron-Induced Fault Model for Deep Neural Networks
Document Type
Periodical
Source
IEEE Transactions on Nuclear Science IEEE Trans. Nucl. Sci. Nuclear Science, IEEE Transactions on. 70(4):370-380 Apr, 2023
Subject
Nuclear Engineering
Bioengineering
Circuit faults
Graphics processing units
Reliability
Software
Error analysis
Particle beams
Hardware
Deep learning
neutron radiation effects
parallel architectures
reliability
Language
ISSN
0018-9499
1558-1578
Abstract
The reliability evaluation of deep neural networks (DNNs) executed on graphic processing units (GPUs) is a challenging problem, since the hardware architecture is highly complex, and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results, since it has limited access to the hardware resources, and the adopted fault models may be too naive (i.e., single and double bit flips). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This article proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at the software level. We exposed GPUs running general matrix multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61% and show that the error correction code (ECC) is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from register-transfer level (RTL) simulations. Our results show that by injecting complex fault models, the version 3 of you only look once (YOLOv3) misdetection rate is validated to be very close to the rate measured with beam experiments, which is $8.66\times $ higher than the one measured with fault injection using only single-bit flips.