학술논문

Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Document Type
Periodical
Source
IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 30(11):2507-2522 Nov, 2019
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Gradient methods
Symmetric matrices
Global communication
Pipeline processing
Numerical stability
Hardware
Krylov subspace methods
pipelining
parallel performance
global communication
latency hiding
conjugate gradients
numerical stability
inexact computations
attainable accuracy
Language
ISSN
1045-9219
1558-2183
2161-9883
Abstract
Pipelined Krylov subspace methods (also referred to as communication-hiding methods) have been proposed in the literature as a scalable alternative to classic Krylov subspace algorithms for iteratively computing the solution to a large linear system in parallel. For symmetric and positive definite system matrices the pipelined Conjugate Gradient method, p($l$l)-CG, outperforms its classic Conjugate Gradient counterpart on large scale distributed memory hardware by overlapping global communication with essential computations like the matrix-vector product, thus “hiding” global communication. A well-known drawback of the pipelining technique is the (possibly significant) loss of numerical stability. In this work a numerically stable variant of the pipelined Conjugate Gradient algorithm is presented that avoids the propagation of local rounding errors in the finite precision recurrence relations that construct the Krylov subspace basis. The multi-term recurrence relation for the basis vector is replaced by $\ell$ℓ three-term recurrences, improving stability without increasing the overall computational cost of the algorithm. The proposed modification ensures that the pipelined Conjugate Gradient method is able to attain a highly accurate solution independently of the pipeline length. Numerical experiments demonstrate a combination of excellent parallel performance and improved maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm. This work thus resolves one of the major practical restrictions for the useability of pipelined Krylov subspace methods.