학술논문

Fast Execution of Simultaneous Breadth-First Searches on Sparse Graphs
Document Type
Conference
Source
2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on. :9-18 Dec, 2015
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Graphics processing units
Algorithm design and analysis
Search problems
Parallel processing
Hardware
Manuals
Measurement
Language
ISSN
1521-9097
Abstract
The construction of efficient parallel graph algorithms is important for quickly solving problems in areas such as urban planning, social network analysis, and hardware verification. Existing GPU implementations of graph algorithms tend to be monolithic and thus contributions from the literature are typically rebuilt rather than reused. Recent work has focused on traversal-based abstractions that efficiently execute a single breadth-first search or enact algorithms in the “think like a vertex” paradigm. However, graph analytics such as the all-pairs shortest paths problem, diameter computations, betweenness centrality, and reachability querying require the execution of many such graph traversals. Typically, these traversals are independent of one another and can thus be executed in parallel. This paper presents multi-search, a simple abstraction that is designed for graph algorithms requiring many breadth-first searches that can be executed simultaneously. Although algorithms have implicitly leveraged this abstraction in the past, we provide an explicit, reusable implementation that efficiently maps this abstraction to the GPU, performing more than twice as fast as previous approaches on large graphs of varying diameter. This approach allows us to scale our APSP implementation to graphs with millions of vertices using a single GPU whereas prior approaches were either constrained to much smaller graph instances or required large supercomputers to process graphs of similar size. To show the flexibility of our abstraction, we use it to express betweenness centrality and achieve more than a 5.82x average speedup over parallel CPU implementations from existing frameworks and a 2.24x average speedup over a manual, highly optimized GPU implementation of the algorithm.