학술논문

A Generalized Community-Structure-Aware Optimization Framework for Efficient Subgraph Matching in Social Network Analysis
Document Type
Periodical
Author
Source
IEEE Transactions on Computational Social Systems IEEE Trans. Comput. Soc. Syst. Computational Social Systems, IEEE Transactions on. 11(2):2545-2557 Apr, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Pattern matching
Optimization
Computer viruses
Social networking (online)
Indexes
Computational efficiency
Influenza
Community structure
optimization
subgraph matching
Language
ISSN
2329-924X
2373-7476
Abstract
Subgraph matching is a common and important query for analyzing social networks. However, existing subgraph matching algorithms are still unsatisfactory in efficiency. As different subgraph matching algorithms are suitable for different conditions, this article aims to find optimization methods effective for different subgraph matching algorithms. Specifically, the contributions of this article mainly include three aspects. First, this article defines the concept of a community distribution scheme (c-scheme), and proposes an efficient generalized community-structure-aware optimization framework named GCF to accelerate the process of subgraph matching. Specifically, in the GCF framework, subgraph matching is performed by dealing with all c-schemes, and GCF can accelerate different subgraph matching algorithms by exploiting the community structure of the graph data. Second, three strategies are proposed and applied in GCF to further accelerate the process of subgraph matching. In detail, GCF analyzes the structure of the pattern graph and avoids redundant calculation with the strategy of two-level symmetry-breaking (2LS). Besides, GCF builds indexes on the community paths of the data graph to detect useless c-schemes in advance. Moreover, GCF performs community-structure-based boundary pruning and reduces the search space. Third, in the experiments, four state-of-the-art subgraph matching algorithms, i.e., TurboISO, VF3, VF3-Light, and DAF, are optimized with GCF. The results of the experiments conducted on both real-world and synthetic datasets show that the GCF is efficient and can significantly reduce the time consumption of different existing subgraph matching algorithms. Specifically, the optimized subgraph matching algorithms can be more than $2800\times $ faster than the original algorithms in the best condition. Moreover, the experimental results on datasets with different scales demonstrate that GCF has considerable scalability.