학술논문

Exploiting Reduced Precision for GPU-based Time Series Mining
Document Type
Conference
Source
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) IPDPS Parallel and Distributed Processing Symposium (IPDPS), 2022 IEEE International. :124-134 May, 2022
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Time series analysis
Layout
Graphics processing units
Switches
Data mining
Synchronization
Parallel algorithms
Multi-GPU algorithms
data mining
reduced precision
multi-dimensional time series
matrix profile
Language
ISSN
1530-2075
Abstract
The mining of multi-dimensional time series is a crucial step in gaining insights into data obtained from physical systems and from monitoring infrastructures. A widely accepted approach for this challenge is the matrix profile, which, however, is computationally very expensive. It relies on calculating large correlation matrices coupled with sort operations across all dimensions of the data, as well as on performing inclusive scans. All of these steps are inherently data parallel and can, therefore, benefit from execution on GPUs, and even more so from horizontal scaling on multiple GPUs. In addition, the nature of the matrix profile calculation allows the exploitation of reduced precision on GPUs. This offers further improvements to enable the analysis of ever growing data sets in real-world scenarios. Based on these motivations, we introduce the first parallel algorithm for multi-dimensional matrix profile on multiple GPUs exploiting reduced precision modes and provide a highly opti-mized implementation using novel optimization techniques. On one NVIDIA A100 GPU, our implementation achieves a 54x performance improvement in comparison to an optimized single-node execution on a state-of-the-art CPU-based implementation relying on double-precision computation and an additional factor of 1.4x when switching to reduced precision while maintaining sufficient accuracy. We study the accuracy and performance trade-offs for our proposed algorithm in detail and present synthetic and real-world case studies to demonstrate how the reduced precision improves the performance, while accomplishing sufficiently accurate results.