학술논문

A64FX performance: experience on Ookami
Document Type
Conference
Source
2021 IEEE International Conference on Cluster Computing (CLUSTER) CLUSTER Cluster Computing (CLUSTER), 2021 IEEE International Conference on. :711-718 Sep, 2021
Subject
Computing and Processing
Limiting
Fast Fourier transforms
Conferences
Measurement uncertainty
Cluster computing
Tools
Libraries
high-performance computing
Language
ISSN
2168-9253
Abstract
We examine the performance of scientific and engineering kernels on the Fujitsu A64FX processor, both out-of-the-box using various toolchains and with processor-specific optimizations. While nearly all applications port with little to no modification, significant performance variation is observed between the multiple tool chains. This variation depends heavily upon characteristics of the application (most notably its use of mathematical functions) and is also constrained by the most performant toolchains having limited support for recent language standards. As expected, high performance demands that a kernel is vectorized, multi-threaded, and localizes memory references. Detailed optimizations, including use of intrinsics, are also examined to understand performance gaps and what is necessary to attain peak performance. This article employs the Ookami computer technology testbed funded by the American National Science Foundation. The system provides researchers worldwide with access to 176 Fujitsu A64FX compute nodes as well as other state-of the-art technology.