학술논문

Templated Hybrid Reusable Computational Analytics Workflow Management with Cloudmesh, Applied to the Deep Learning MLCommons Cloudmask Application
Document Type
Conference
Source
2023 IEEE 19th International Conference on e-Science (e-Science) e-Science (e-Science), 2023 IEEE 19th International Conference on. :1-6 Oct, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Deep learning
Codes
Operating systems
Linux
Benchmark testing
History
Synchronization
Task analysis
Artificial intelligence
Monitoring
experiment workflow
task workflow
hyperparameter workflow
high-performance computing
batch queue management
workflow web service
cloudmesh
Language
ISSN
2325-3703
Abstract
In this paper, we summarize our effort to create and utilize an integrated framework to coordinate computational AI analytics tasks with the help of a task and experiment management workflow system. Our design is based on a minimalistic approach while at the same time allowing access to hybrid computational resources offered through the owner's computer, HPC computing centers, cloud resources, and distributed systems in general. Access to this framework includes a GUI for monitoring and managing the workflow, a REST service, a command line interface, as well as a Python interface. It uses a template-based batch management system that, through configuration files, easily allows for the generation of reproducible experiments while creating permutations over selected experiment parameters as typical in deep learning applications. The resulting framework was developed for analytics workflows targeting MLCommons benchmarks of AI applications on hybrid computing resources, as well as an educational tool for teaching scientists and students sophisticated concepts to execute computations on resources ranging from a single computer to many thousands of computers as part of on-premise and cloud infrastructure. We demonstrate the usefulness of the tool while creating FAIR principle-based application accuracy benchmark generation for the MLCommons Science Working Group Cloudmask application. The code is available as an open-source project in GitHub and is based on an easy-to-enhance framework called Cloudmesh. It can be applied to other applications easily.