학술논문

Reliability-Aware Data Placement for Heterogeneous Memory Architecture

Document Type

Conference

Author

Gupta, Manish; Sridharan, Vilas; Roberts, David; Prodromou, Andreas; Venkat, Ashish; Tullsen, Dean; Gupta, Rajesh

Source

2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) HPCA High Performance Computer Architecture (HPCA), 2018 IEEE International Symposium on. :583-595 Feb, 2018

Subject

Computing and Processing
Reliability
Random access memory
Memory architecture
Bandwidth
Memory management
Transient analysis
Error correction codes
Memory
Heterogenous Memory Architecture

Language

ISSN

2378-203X

Abstract

System reliability is a first-class concern as technology continues to shrink, resulting in increased vulnerability to traditional sources of errors such as single event upsets. By tracking access counts and the Architectural Vulnerability Factor (AVF), application data can be partitioned into groups based on how frequently it is accessed (its "hotness") and its likelihood to cause program execution error (its "risk"). This is particularly useful for memory systems which exhibit heterogeneity in their performance and reliability such as Heterogeneous Memory Architectures – with a typical configuration combining slow, highly reliable memory with faster, less reliable memory. This work demonstrates that current state of the art, performance-focused data placement techniques affect reliability adversely. It shows that page risk is not necessarily correlated with its hotness; this makes it possible to identify pages that are both hot and low risk, enabling page placement strategies that can find a good balance of performance and reliability. This work explores heuristics to identify and monitor both hotness and risk at run-time, and further proposes static, dynamic, and program annotation-based reliability-aware data placement techniques. This enables an architect to choose among available memories with diverse performance and reliability characteristics. The proposed heuristic-based reliability-aware data placement improves reliability by a factor of 1.6x compared to performance-focused static placement while limiting the performance degradation to 1%. A dynamic reliability-aware migration scheme, which does not require prior knowledge about the application, improves reliability by a factor of 1.5x on average while limiting the performance loss to 4.9%. Finally, program annotation-based data placement improves the reliability by 1.3x at a performance cost of 1.1%.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송