학술논문
SDC is in the Eye of the Beholder: A Survey and Preliminary Study
Document Type
Conference
Author
Source
2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W) Dependable Systems and Networks Workshop, 2016 46th Annual IEEE/IFIP International Conference on. :72-76 Jun, 2016
Subject
Language
Abstract
Silent data corruptions (SDCs) are one of the most critical issues in modern HPC systems, as they are "silent" by definition and raise no warnings to users and application developers that a calculation has been corrupted. A significant amount of effort has been made to characterize, detect, and tolerate SDCs. However, current approaches do not share the same understanding of SDC, hence it is not only difficult to evaluate their effectiveness, but also to compare with each other. This position paper argues that SDCs should be discussed at each layer of the system and are confined within the goal of the approach. We provide a preliminary result to differentiate data corruptions across system layers, and show that application-specific correctness checks can tolerate about 50% of the errors that appear in the application output.