학술논문

Architecting a chunk-based memory race recorder in Modern CMPs
Document Type
Conference
Source
2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on. :576-586 Dec, 2009
Subject
Computing and Processing
Components, Circuits, Devices and Systems
Coherence
Degradation
Interleaved codes
Yarn
Hardware
Protocols
Permission
Writing
Programming profession
Multiprocessing systems
Memory Race Recorder
Deterministic Replay
Determinism
Language
ISSN
1072-4451
2379-3155
Abstract
Prior work on HW support for memory race recording piggybacks time stamps on coherence messages and logs the outcome of memory races using point-to-point or chunk-based approaches. These memory race recorder (MRR) techniques are effective, but they require modifications to the cache coherence protocol that can hurt performance. In addition, prior work has mostly focused on directory coherence and considered only CMP systems with single-level cache hierarchies. Most modern CMP systems shipped today, however, implement snoop coherence and feature multilevel cache hierarchies. To be practical, a MRR must target CMPs with multilevel caches, mitigate the coherence overhead due to piggybacking, and emphasize on replay speed to broaden applicability of deterministic replay. This paper contributes three new solutions for making chunk-based MRR practical for modern CMPs. We show that MRR interactions with a cache hierarchy can degrade performance and present a novel mechanism that mitigates this degradation. We propose new mechanisms for snoop-based caches that eliminate coherence traffic overhead due to piggybacking. We finally propose new techniques for improving replay speed and introduce a novel framework for evaluating the replay speed potential of MRR designs.