학술논문

High-performance scalable information service for the ATLAS experiment
Document Type
Conference
Source
2012 18th IEEE-NPSS Real Time Conference Real Time Conference (RT), 2012 18th IEEE-NPSS. :1-5 Jun, 2012
Subject
Nuclear Engineering
Power, Energy and Industry Applications
Engineered Materials, Dielectrics and Plasmas
Fields, Waves and Electromagnetics
General Topics for Engineers
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Monitoring
Servers
Information services
Message systems
Mirrors
Receivers
Software
Language
Abstract
The ATLAS experiment is being operated by highly distributed computing system which is constantly producing a lot of status information which is used to monitor the experiment operational conditions as well as to assess the quality of the physics data being taken. For example the ATLAS High Level Trigger(HLT) algorithms are executed on the online computing farm consisting from about 1500 nodes. Each HLT algorithm is producing few thousands histograms, which have to be integrated over the whole farm and carefully analyzed in order to properly tune the event rejection. In order to handle such non-physics data the Information Service (IS) facility has been developed in the scope of the ATLAS Trigger and Data Acquisition (TDAQ) project. The IS provides high-performance scalable solution for information exchange in distributed environment. In the course of an ATLAS data taking session the IS handles about hundred gigabytes of information which is being constantly updated with the update interval varying from a second to few tens of seconds. IS provides access to any information item on request as well as distributing notification to all the information subscribers. In latter case IS subscribers receive information within few milliseconds after it was updated. IS can handle arbitrary types of information including histograms produced by the HLT applications and provides C++, Java and Python API. The Information Service is a primarily and in most cases a unique source of information for the majority of the online monitoring analysis and GUI applications, used to control and monitor the ATLAS experiment. Information Service provides streaming functionality allowing efficient replication of all or part of the managed information. This functionality is used to duplicate the subset of the ATLAS monitoring data to the CERN public network with the latency of few milliseconds, allowing efficient real-time monitoring of the data taking from outside the protected ATLAS network. Each information item in IS has an associated URL which can be used to access that item online via HTTP protocol. This functionality is being used by many online monitoring applications which can run in a WEB browser, providing real-time monitoring information about ATLAS experiment over the globe. This paper will describe design and implementation of the IS and present performance results which have been taken in the ATLAS operational environment.