학술논문

Automatic, Abstracted and Portable Topology-Aware Thread Placement

Document Type

Conference

Author

Gustedt, Jens; Jeannot, Emmanuel; Mansouri, Farouk

Source

2017 IEEE International Conference on Cluster Computing (CLUSTER) CLUSTER Cluster Computing (CLUSTER), 2017 IEEE International Conference on. :389-399 Sep, 2017

Subject

Computing and Processing
Instruction sets
Runtime
Topology
Hardware
Computer architecture
Libraries
Thread placement
Task based runtimes
Hardware affinity
Parallel programming

Language

ISSN

2168-9253

Abstract

Efficiently programming shared-memory machines is a difficult challenge because mapping application threads onto the memory hierarchy has a strong impact on the performance. However, optimizing such thread placement is difficult: architectures become increasingly complex and application behavior changes with implementations and input parameters, e.g problem size and number of threads. In this work, we propose a fully automatic, abstracted and portable affinity module. It produces and implements an optimized affinity strategy that combines knowledge about application characteristics and the platform topology. Implemented in theback-end of our runtime system (ORWL), our approach was used to enhance the performance and the scalability of several unmodified ORWL-coded applications: matrix multiplication, a 2D stencil (Livermore Kernel 23), and a video tracking real world application. On two SMP machines with quite different hardware characteristics, our tests show spectacular performance improvements for these unmodified application codes due to a dramatic decrease of cache misses and pipeline stalls. A comparison to reference implementations using OpenMP confirms this performance gain of almost one order of magnitude.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송