학술논문

IExchange: Asynchronous Communication and Termination Detection for Iterative Algorithms
Document Type
Conference
Source
2021 IEEE 11th Symposium on Large Data Analysis and Visualization (LDAV) LDAV Large Data Analysis and Visualization (LDAV); 2021 IEEE 11th Symposium on. :12-21 Oct, 2021
Subject
Computing and Processing
Asynchronous communication
Protocols
Program processors
Data analysis
Parallel programming
Data visualization
Computational efficiency
Computing methodologies
Parallel computing methodologies
Parallel algorithms
Massively parallel algorithms
Language
Abstract
Iterative parallel algorithms can be implemented by synchronizing after each round. This bulk-synchronous parallel (BSP) pattern is inefficient when strict synchronization is not required: global synchronization is costly at scale and prohibits amortizing load imbalance over the entire execution, and termination detection is challenging with irregular data-dependent communication. We present an asynchronous communication protocol that efficiently interleaves communication with computation. The protocol includes global termination detection without obstructing computation and communication between nodes. The user's computational primitive only needs to indicate when local work is done; our algorithm detects when all processors reach this state. We do not assume that global work decreases monotonically, allowing processors to createnew work. We illustrate the utility of our solution through experiments, including two large data analysis and visualization codes: parallel particle advection and distributed union-find. Our asynchronous algorithm is several times faster with better strong scaling efficiency than the synchronous approach.