학술논문

Fast Execution of Simultaneous Breadth-First Searches on Sparse Graphs

Document Type

Conference

Author

Source

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on. :9-18 Dec, 2015

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
Graphics processing units
Algorithm design and analysis
Search problems
Parallel processing
Hardware
Manuals
Measurement

Language

ISSN

1521-9097

Abstract

The construction of efficient parallel graph algorithms is important for quickly solving problems in areas such as urban planning, social network analysis, and hardware verification. Existing GPU implementations of graph algorithms tend to be monolithic and thus contributions from the literature are typically rebuilt rather than reused. Recent work has focused on traversal-based abstractions that efficiently execute a single breadth-first search or enact algorithms in the “think like a vertex” paradigm. However, graph analytics such as the all-pairs shortest paths problem, diameter computations, betweenness centrality, and reachability querying require the execution of many such graph traversals. Typically, these traversals are independent of one another and can thus be executed in parallel. This paper presents multi-search, a simple abstraction that is designed for graph algorithms requiring many breadth-first searches that can be executed simultaneously. Although algorithms have implicitly leveraged this abstraction in the past, we provide an explicit, reusable implementation that efficiently maps this abstraction to the GPU, performing more than twice as fast as previous approaches on large graphs of varying diameter. This approach allows us to scale our APSP implementation to graphs with millions of vertices using a single GPU whereas prior approaches were either constrained to much smaller graph instances or required large supercomputers to process graphs of similar size. To show the flexibility of our abstraction, we use it to express betweenness centrality and achieve more than a 5.82x average speedup over parallel CPU implementations from existing frameworks and a 2.24x average speedup over a manual, highly optimized GPU implementation of the algorithm.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송