학술논문

An HPC-Container Based Continuous Integration Tool for Detecting Scaling and Performance Issues in HPC Applications
Document Type
Periodical
Source
IEEE Transactions on Services Computing IEEE Trans. Serv. Comput. Services Computing, IEEE Transactions on. 17(1):156-168 Jan, 2024
Subject
Computing and Processing
General Topics for Engineers
Testing
Task analysis
Scalability
Software quality
Monitoring
Cloud computing
Software development management
Scalability test
continuous integration
high performance computing
cloud computing
container
Language
ISSN
1939-1374
2372-0204
Abstract
Testing is one of the most important steps in software development–it ensures the quality of software. Continuous Integration (CI) is a widely used testing standard that can report software quality to the developer in a timely manner during development progress. Performance, especially scalability, is another key factor for High Performance Computing (HPC) applications. There are many existing profiling and performance tools for HPC applications, but none of these are integrated into CI tools. In this work, we propose BeeSwarm, an HPC container based parallel scaling performance system that can be easily applied to the current CI test environments. BeeSwarm is mainly designed for HPC application developers who need to monitor how their applications can scale on different compute resources. We demonstrate BeeSwarm using three different HPC applications: CoMD, LULESH and NWChem. We utilize GitHub Actions and provision resources from Google Compute Engine. Our results show that BeeSwarm can be used for scalability and performance testing of a variety of HPC applications, allowing developers to monitor application performance over time.