학술논문

HEPnOS: a Specialized Data Service for High Energy Physics Analysis
Document Type
Conference
Source
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) IPDPSW Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2023 IEEE International. :637-646 May, 2023
Subject
Computing and Processing
Neutrino sources
Runtime
Scalability
Government
Distributed databases
Signal processing algorithms
Load management
HPC
Storage
Mochi
Language
Abstract
In this paper, we present HEPnOS, a distributed data service for managing data produced by high-energy physics (HEP) experiments. Using HEPnOS, HEP applications can use HPC resources more efficiently than traditional file-based applications. The file-based model leads to a rigid, chunk-based allocation of computational resources and limits the number of cores that can be used concurrently by an HEP application. The fundamental problem is that organizing domain-specific data into files inadvertently introduces a single, artificial, conflated tuning parameter that puts key optimization goals into conflict: larger file sizes reduce metadata overhead and thus improve I/O efficiency, but smaller file sizes provide more opportunity for workflow parallelism and load balancing. In this work, we introduce a domain-specific data service that decouples that constraint so that data can be accessed and processed in its natural granularity while still maintaining I/O efficiency. By removing the constraints introduced by file handling we are able to obtain better scaling and make efficient use of more cores for processing a fixed-sized data sample. We demonstrate the improved scalability by using an application developed in the file-based paradigm and comparing it to a version modified to use HEPnOS.