학술논문

Fast GPU 3D Diffeomorphic Image Registration
Document Type
Working Paper
Source
Journal of Parallel and Distributed Computing 149:149-162, 2021
Subject
Computer Science - Distributed, Parallel, and Cluster Computing
Electrical Engineering and Systems Science - Image and Video Processing
Mathematics - Optimization and Control
68U10, 49J20, 35Q93, 65K10, 65F08, 76D55
Language
Abstract
3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss--Newton--Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformation diffeomorphic registration packages support GPUs. Our contributions are new algorithms to significantly reduce the run time of the two main computational kernels in CLAIRE: calculation of derivatives and scattered-data interpolation. We deploy (i) highly-optimized, mixed-precision GPU-kernels for the evaluation of scattered-data interpolation, (ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with optimized 8th-order finite differences, and (iii) compare with state-of-the-art CPU and GPU implementations. As a highlight, we demonstrate that we can register $256^3$ clinical images in less than 6 seconds on a single NVIDIA Tesla V100. This amounts to over 20$\times$ speed-up over the current version of CLAIRE and over 30$\times$ speed-up over existing GPU implementations.
Comment: 20 pages, 6 figures, 8 tables