학술논문

Performance Evaluation of Scale-Free Graph Algorithms in Low Latency Non-volatile Memory
Document Type
Conference
Source
2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) IPDPSW Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. :1021-1028 May, 2017
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Signal Processing and Analysis
Nonvolatile memory
Random access memory
Heuristic algorithms
Arrays
Benchmark testing
Flash memories
Non-Volatile Memory
Performance Evaluation
Scale-Free Graph Algorithms
Language
Abstract
The purpose of this study is to quantitatively assess the performance of graph processing algorithms for large scale-free graphs residing in byte-addressable Non-Volatile Memory (NVM). Our study focuses on static and dynamic graph algorithms previously optimized for external memory in the form of locally attached NAND Flash arrays, with data structures tuned to maximize locality. The evaluation is run on a unique resource, an NVM hardware emulator from Intel capable of inserting delays to memory reads through microcode instructions thatdelay load instructions missing in L3; the emulated NVM appears as separate UMA node (identified by a physical address range) and that is not part of the socket-attached NUMA nodes. In this work, we distinguish two graph processing configurations, 'semi-external' in which the graph is fully resident in NVM but in-flight intermediate data structures reside in DRAM, and 'fully external' in which both the graph and the intermediate data structures reside in NVM. Our goal is to assess the performance impact of NVM latency of up to 3.5X DRAM, with (semi-external) and without (fully external) an application-specific scratchpad for the in-flight data structures. We find a performance penalty of 59.6% in the fully external scenario, which is reduced to 5.2% with the scratchpad. Our results show that graph algorithms employing locality aware data structure layout and processing can benefit immediately from emerging NVMs with minimal performance impact, making NVM a high value resource for large scale graph processing.