학술논문

Characterization and bottleneck analysis of a 64-bit ARMv8 platform
Document Type
Conference
Source
2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on. :36-45 Apr, 2016
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Bridges
Computational modeling
Power measurement
Radiation detectors
Hardware
Benchmark testing
Energy consumption
Language
Abstract
This paper presents the first comprehensive study of the performance, power and energy consumption of the Applied-Micro X-Gene, the first commercially available 64-bit ARMv8 platform, for HPC workloads. Our study includes a detailed comparison of the X-Gene to three other architectural design points common in HPC systems. Across these platforms, we perform careful measurements across 400+ workloads, covering different application domains, parallelization models, floating-point precision models and memory intensities. We find that the X-Gene has an average of 1.2× better energy consumption than an Intel Sandy Bridge, a design commonly found in HPC installations, while the Sandy Bridge is an average of 2.3× faster than X-Gene. Precisely quantifying the causes of performance and energy differences between two platforms is an important but challenging problem that is often addressed via detailed simulation, an approach that has limited ability to scale up to full applications and broad workload mixes. Instead, this paper adopts a statistical framework called Partial Least Squares (PLS) Path Modeling to solve this problem. PLS Path Modeling allows us to capture complex cause-effect relationships and difficult-to-measure performance concepts relating to the effectiveness of architectural units and subsystems in improving application performance using readily available hardware counter measurements. We use PLS Path Modeling to quantify the causes of the performance differences between X-Gene and Sandy Bridge in the HPC domain, finding that the performance of the memory subsystem is the dominant cause of these differences.