학술논문

BOW: Breathing Operand Windows to Exploit Bypassing in GPUs
Document Type
Conference
Source
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) MICRO Microarchitecture (MICRO), 2020 53rd Annual IEEE/ACM International Symposium on. :996-1008 Oct, 2020
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Radio frequency
Power demand
Pipelines
Graphics processing units
Organizations
Registers
Optimization
operand bypassing
GPU
register file
microarchitecture
compiler
Language
Abstract
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible for a large portion of the area and power. To simplify the architecture of the RF, it is organized in a multi-bank configuration with a single port for each bank. Not surprisingly, the frequent accesses to the register file during kernel execution incur a sizeable overhead in GPU power consumption, and introduce delays as accesses are serialized when port conflicts occur. In this paper, we observe that there is a high degree of temporal locality in accesses to the registers: within short instruction windows, the same registers are often accessed repeatedly. We characterize the opportunities to reduce register accesses as a function of the size of the instruction window considered, and establish that there are many recurring reads and updates of the same register operands in most GPU computations. To exploit this opportunity, we propose Breathing Operand Windows (BOW), an enhanced GPU pipeline and operand collector organization that supports bypassing register file accesses and instead passes values directly between instructions within the same window. Our baseline design can only bypass register reads; we introduce an improved design capable of also bypassing unnecessary write operations to the RF. We introduce compiler optimizations to help guide the write-back destination of operands depending on whether they will be reused to further reduce the write traffic. To reduce the storage overhead, we analyze the occupancy of the bypass buffers and discover that we can significantly down size them without losing performance. BOW along with optimizations reduces dynamic energy consumption of the register file by 55% and increases the performance by 11%, with a modest overhead of 12KB increase in the size of the operand collectors (4% of the register file size).