학술논문

Building Blocks for a System-Wide Power and Thermal Management Framework
Document Type
Conference
Source
2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on. :700-707 Dec, 2015
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Temperature measurement
Mathematical model
Program processors
Clocks
Random access memory
Semiconductor device measurement
Power measurement
Language
ISSN
1521-9097
Abstract
Next generation Exascale systems face the difficult challenge of managing the power and thermal constraints that come from packaging more transistors into a smaller space while adding more processors into a single system. To combat this, HPC center operators are looking for methodologies to save operational energy. Energy consumption in an HPC center is governed by the complex interactions between a number of different components. Without a coordinated and system-wide perspective on reducing energy consumption, isolated actions taken on one component with the intent to lower energy consumption can actually have the opposite effect on another component, thereby canceling out the net effect. For example, increasing the setpoint (or ambient temperature) to save cooling energy can lead to increased compute-node fan power and increased chip leakage power. This paper presents the building blocks required to develop and implement a system-wide framework that can take a coordinated approach to enact thermal and power management decisions at compute-node (e.g., CPU speed throttling) and infrastructure levels (e.g., selecting optimal setpoint). These building blocks consist of a suite of models that inform the thermal and power footprint of different computations, and present relationships between computational properties and datacenter operating conditions.