학술논문

Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided
Document Type
Working Paper
Source
Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis, pages 53:1--53:12, November 2013
Subject
Computer Science - Distributed, Parallel, and Cluster Computing
Computer Science - Performance
C.5.1
J.2
Language
Abstract
Modern interconnects offer remote direct memory access (RDMA) features. Yet, most applications rely on explicit message passing for communications albeit their unwanted overheads. The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly, however, it's scalability and practicability has to be demonstrated in practice. In this work, we develop scalable bufferless protocols that implement the MPI-3.0 specification. Our protocols support scaling to millions of cores with negligible memory consumption while providing highest performance and minimal overheads. To arm programmers, we provide a spectrum of performance models for all critical functions and demonstrate the usability of our library and models with several application studies with up to half a million processes. We show that our design is comparable to, or better than UPC and Fortran Coarrays in terms of latency, bandwidth, and message rate. We also demonstrate application performance improvements with comparable programming complexity.
Comment: Best Paper Award at ACM/IEEE Supercomputing'13 (1/92), also Best Student Paper finalist (8/92); source code of foMPI can be downloaded from http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI