학술논문

A Learning-based Fetch Thread Gating Mechanism for A Simultaneous Multithreading Processor
Document Type
Conference
Source
2020 Eighth International Symposium on Computing and Networking (CANDAR) CANDAR Computing and Networking (CANDAR), 2020 Eighth International Symposium on. :1-10 Nov, 2020
Subject
Computing and Processing
Adaptation models
Multithreading
Instruction sets
Logic gates
Hardware
System-on-chip
Integrated circuit modeling
simultaneous multithreading
microarchitecture
perceptron
fetch policy
Language
ISSN
2379-1896
Abstract
Simultaneous Multithreading (SMT) technology is widely adopted in modern high-end processors to maximize on-chip hardware utilization. In an SMT processor, multiple threads are executed in parallel, sharing hardware resources. This technique aggregates potential efficiency which will not be available in a single thread processor. However, when a data cache miss or a branch prediction miss occurs, every thread competing for hardware resources causes the degradation of hardware utilization. Therefore, instruction fetch policies have been proposed to manage hardware resources efficiently in SMT processors. The fetch policies distribute hardware resources indirectly, through fetch bandwidth control.Conventionally, a fetch policy selects fetch threads based only on the resource usage at the moment, while the characteristics of threads are not exploited on decision. Therefore most conventional fetch control schemes only take effects only after an outstanding event occurs. Capability of resource restriction is limited even with an aggressive fetch control scheme.In this paper, we propose a Fetch Gate Estimator (FGE) that is a fetch gating mechanism based on machine learning, which is implemented as a hardware module. The FGE evaluates each thread to decide whether an instruction fetch from a thread should be gated. The FGE is trained dynamically by the execution statistics resulted from the inferences, so that characteristics of each thread are encoded into a learning model. Thus, the FGE is trained dynamically in parallel with execution of programs. We applied a single layer perceptron as learning model inside the FGE for circuit simplicity, and investigated performance impacts. Evaluation results show that the perceptron-based FGE can train itself from acquired execution statistics to identify inefficient threads adaptively, adjusting resources through the fetch gate control mechanism.