학술논문

Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs
Document Type
Conference
Source
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) IPDPS Parallel and Distributed Processing Symposium (IPDPS), 2022 IEEE International. :886-896 May, 2022
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Distributed processing
Costs
Supercomputers
Software
Hardware
Behavioral sciences
Numerical stability
Language
ISSN
1530-2075
Abstract
The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.