학술논문

An unsupervised learning-guided multi-node failure-recovery model for distributed graph processing systems.
Document Type
Article
Source
Journal of Supercomputing. Jun2023, Vol. 79 Issue 9, p9383-9408. 26p.
Subject
*DISTRIBUTED computing
*SYSTEM failures
*BIG data
*HIERARCHICAL clustering (Cluster analysis)
Language
ISSN
0920-8542
Abstract
Big data applications based on graphs need to be scalable enough for handling immense growth in size of graphs, efficiently. Scalable graph processing typically handles the high workload by increasing the number of computing nodes. However, this increases the chances of single or multiple node (multi-node) failures. Failures may occur during normal job execution, as well as during recovery. Most of the systems for failure detection either follow checkpoint-based recovery which has high computation cost, or follows replication that has high memory overhead. In this work, we have proposed an unsupervised learning-based failure-recovery scheme for graph processing systems that detects different kinds of failures and allows node recovery within a shorter amount of time. It has been able to provide enhanced performance as compared to traditional failure-recovery models with respect to simultaneous recovery from single and multi-node failures, memory overload and computational latency. Evaluating its performance on four benchmark datasets has reinforced its strength and makes the proposed model completely fit in with the status quo. [ABSTRACT FROM AUTHOR]