학술논문

Accelerating Biclique Counting on GPU
Document Type
Conference
Source
2024 IEEE 40th International Conference on Data Engineering (ICDE) ICDE Data Engineering (ICDE), 2024 IEEE 40th International Conference on. :3191-3203 May, 2024
Subject
Computing and Processing
Runtime
Scalability
Instruction sets
Memory management
Graphics processing units
Parallel processing
Data structures
biclique counting
bipartite graph
GPU
Language
ISSN
2375-026X
Abstract
Counting ($p$, q)-bicliques in bipartite graphs poses a foundational challenge with broad applications, from densest sub-graph discovery in algorithmic research to personalized content recommendation in practical scenarios. Despite its significance, current leading ($p$, q)-biclique counting algorithms fall short, particularly when faced with larger graph sizes and clique scales. Fortunately, the problem's inherent structure, allowing for the independent counting of each biclique starting from every vertex, combined with a substantial set intersections, makes it highly amenable to parallelization. Recent successes in GPU-accelerated algorithms across various domains motivate our exploration into harnessing the parallelism power of GPUs to efficiently address the ($p$, q)-biclique counting challenge. We introduce GBC (GPU-based Biclique Counting), a novel approach designed to enable efficient and scalable ($p$, q)-biclique counting on GPUs. To address major bottleneck arising from redundant comparisons in set intersections (occupying an average of 90% of the runtime), we introduce a novel data structure that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations. Our in-novative hybrid DFS-BFS exploration strategy further enhances thread utilization and effectively manages memory constraints. A composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads. Additionally, we employ vertex reordering and graph partitioning strategies for improved compactness and scalability. Experimental evaluations on eight real-life and two synthetic datasets demonstrate that GBC outperforms state-of-the-art algorithms by a substantial margin. In particular, GBC achieves an average speedup of $497.8\times$, with the largest instance achieving a remarkable $1217.7\times$ speedup when $p=q=8$.