학술논문

Communication Optimization for Distributed Execution of Graph Neural Networks

Document Type

Conference

Author

Kurt, Sureyya Emre; Yan, Jinghua; Sukumaran-Rajam, Aravind; Pandey, Prashant; Sadayappan, P.

Source

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) IPDPS Parallel and Distributed Processing Symposium (IPDPS), 2023 IEEE International. :512-523 May, 2023

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Training
Distributed processing
Machine learning algorithms
Machine learning
Graph neural networks
Partitioning algorithms
Sparse matrices
Graph Neural Networks
Distributed Algorithms
Multi-GPU GNN
Performance Modeling

Language

ISSN

1530-2075

Abstract

Graph Neural Networks (GNNs) have emerged as a very powerful and popular machine learning model for numerous application domains. Each stage of a GNN requires an aggregation (sparse matrix-matrix multiplication) and a linear operation (dense matrix-matrix multiplication). Numerous efforts have addressed the development of distributed implementations for GNNs. Although efficient algorithms for distributed matrix multiplication are well known, the challenge here is the collective optimization of sequences of distributed matrix-matrix multiplications required for GNN, where many degrees of freedom also exist in the ordering of the component matrix-multiplication operations.This paper develops a new approach to distributed GNN, ReDistribution of Matrices (RDM), centered around communication-free distributed matrix-multiplication enabled by matrix redistribution between GNN stages. While the approach is applicable to the numerous algorithmic variants of GNN, the experimental evaluation focuses on GCN (Graph Convolutional Network), including both full-batch training as well as sampling-based training using GraphSAINT. Experimental evaluation with 2-layer and 3-layer GCN, using 128 or 256 hidden features, across eight sparse datasets, on a multi-GPU system with 8 GPUs shows that RDM attains a geometric mean speedup between 2× and 3.7× over two state-of-the-art multi-GPU GCN implementations, CAGNET and DGCL.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송