학술논문

Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform

Document Type

Conference

Author

Source

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) SBAC-PAD Computer Architecture and High Performance Computing (SBAC-PAD), 2017 29th International Symposium on. :137-144 Oct, 2017

Subject

Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Field programmable gate arrays
Arrays
Acceleration
Central Processing Unit
Partitioning algorithms
Algorithm design and analysis

Language

Abstract

Hardware accelerators for graph analytics have gained increasing interest. Vertex-centric and edge-centric paradigms are widely used to design graph analytics accelerators. However, both of them have notable drawbacks: vertex-centric paradigm requires random memory accesses to traverse edges and edge-centric paradigm results in redundant edge traversals. In this paper, we explore the tradeoffs between vertex-centric and edge-centric paradigms and propose a hybrid algorithm which dynamically selects between them during the execution. We introduce the notion of active vertex ratio, based on which we develop a simple but efficient paradigm selection approach. We develop a hybrid data structure to concurrently support vertex-centric and edge-centric paradigms. Based on the hybrid data structure, we propose a graph partitioning scheme to increase parallelism and enable efficient parallel computation on heterogeneous platforms. In each iteration, we use our paradigm selection approach to select the appropriate paradigm for each partition. Further, we map our hybrid algorithm onto a stateof- the-art heterogeneous platform which integrates a multi-core CPU and a Field-Programmable Gate Array (FPGA) in a cache coherent fashion. We use our design methodology to accelerate two fundamental graph algorithms, breadth-first search (BFS) and single-source shortest path (SSSP). Experimental results show that our CPU-FPGA co-processing achieves up to 1.5× (1.9×) speedup for BFS (SSSP) compared with optimized baseline designs. Compared with the state-of-the-art FPGA-based designs, our design achieves up to 4.0× (4.2×) throughput improvement for BFS (SSSP). Compared with a state-of-the-art multi-core design, our design demonstrates up to 1.5× (1.8×) speedup for BFS (SSSP).

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송