학술논문

DMRlib: Easy-Coding and Efficient Resource Management for Job Malleability
Document Type
Periodical
Source
IEEE Transactions on Computers IEEE Trans. Comput. Computers, IEEE Transactions on. 70(9):1443-1457 Sep, 2021
Subject
Computing and Processing
Runtime
Libraries
Resource management
Programming
Standards
Syntactics
Throughput
Processes reconfiguration
MPI malleability
job elastic resize
dynamic reallocation of resources
productivity-aware computation
Language
ISSN
0018-9340
1557-9956
2326-3814
Abstract
Process malleability has proved to have a highly positive impact on the resource utilization and global productivity in data centers compared with the conventional static resource allocation policy. However, the non-negligible additional development effort this solution imposes has constrained its adoption by the scientific programming community. In this work, we present DMRlib, a library designed to offer the global advantages of process malleability while providing a minimalist MPI-like syntax. The library includes a series of predefined communication patterns that greatly ease the development of malleable applications. In addition, we deploy several scenarios to demonstrate the positive impact of process malleability featuring different scalability patterns. Concretely, we study two job submission modes (rigid and moldable) in order to identify the best-case scenarios for malleability using metrics such as resource allocation rate, completed jobs per second, and energy consumption. The experiments prove that our elastic approach may improve global throughput by a factor higher than 3x compared to the traditional workloads of non-malleable jobs.