학술논문

Uncovering the performance bottleneck of modern HPC processor with static code analyzer: a case study on Kunpeng 920
Document Type
Original Paper
Source
CCF Transactions on High Performance Computing. :1-22
Subject
High-performance computing
Micro-architecture exploration
Performance modeling and simulation
Static analysis
Throughput prediction
Language
English
ISSN
2524-4922
2524-4930
Abstract
The performance of high-performance computing (HPC) and other real-world applications is becoming unpredictable as the micro-architecture of the modern central processing unit (CPU) turns to be more and more complex. As a consequence, predicting the execution time of a code snippet is notoriously difficult. Basic block throughput predictor is a crucial feature of the static code analyzer. It offers a ubiquitous method for predicting the execution time of a basic block. In this article, we build a workflow to faithfully run, collect and analyze basic blocks from real-world applications. Several static code analyzers are introduced, compared, and optimized to show which one performs better on accuracy and other metrics on a Kunpeng 920 processor. Through extensive experiments, we achieve state-of-the-art 86.7% accuracy in predicting the throughput of all basic blocks. Moreover, we showcase the potential applications of our optimized static code analyzer in two certain aspects: 1. Guiding the application’s optimization through bottleneck analysis and 2. Exploiting the potential bottleneck of a CPU on a certain workload through fast hardware pre-evaluation.