학술논문

Parthenon—a performance portable block-structured adaptive mesh refinement framework.

Document Type

Article

Author

Grete, Philipp; Dolence, Joshua C; Miller, Jonah M; Brown, Joshua; Ryan, Ben; Gaspar, Andrew; Glines, Forrest; Swaminarayan, Sriram; Lippuner, Jonas; Solomon, Clell J; Shipman, Galen; Junghans, Christoph; Holladay, Daniel; Stone, James M; Roberts, Luke F

Source

International Journal of High Performance Computing Applications. Sep2023, Vol. 37 Issue 5, p465-486. 22p.

Subject

*COMPUTER architecture
*PARALLEL programming
*MESH networks
*MAGNETOHYDRODYNAMICS
*WIRELESS mesh networks
*HYDRODYNAMICS

Language

ISSN

1094-3420

Abstract

On the path to exascale the landscape of computer device architectures and corresponding programming models has become much more diverse. While various low-level performance portable programming models are available, support at the application level lacks behind. To address this issue, we present the performance portable block-structured adaptive mesh refinement (AMR) framework P arthenon, derived from the well-tested and widely used A thena++ astrophysical magnetohydrodynamics code, but generalized to serve as the foundation for a variety of downstream multi-physics codes. P arthenon adopts the K okkos programming model, and provides various levels of abstractions from multidimensional variables, to packages defining and separating components, to launching of parallel compute kernels. P arthenon allocates all data in device memory to reduce data movement, supports the logical packing of variables and mesh blocks to reduce kernel launch overhead, and employs one-sided, asynchronous MPI calls to reduce communication overhead in multi-node simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of 1.7 × 1013 zone-cycles/s on 9216 nodes (73,728 logical GPUs) at ≈ 92 % weak scaling parallel efficiency (starting from a single node). In combination with being an open, collaborative project, this makes Parthenon an ideal framework to target exascale simulations in which the downstream developers can focus on their specific application rather than on the complexity of handling massively-parallel, device-accelerated AMR. [ABSTRACT FROM AUTHOR]

Online Access

Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송