학술논문

A Hybrid Solution to Provide End-to-End Flow Control and Congestion Management in High-Performance Interconnection Networks
Document Type
Conference
Source
2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) CCGRID Cluster, Cloud and Internet Computing (CCGrid), 2024 IEEE 24th International Symposium on. :8-17 May, 2024
Subject
Computing and Processing
Degradation
Data centers
Cloud computing
Multiprocessor interconnection
Switches
Traffic control
Supercomputers
Servers
Proposals
Standards
High-Performance Computing
Interconnection Networks
Data center Networks
congestion management
priority flow control
end-to-end flow control
Language
ISSN
2993-2114
Abstract
Congestion seriously threatens high-performance interconnection networks in supercomputers and data centers, where thousands of server nodes generate massive communication operations when running highly parallel and distributed applications and services. In recent years, numerous solutions have been proposed to address congestion and its effects, including flow control (e.g., priority flow control, PFC) to prevent packet dropping at congested buffers, injection throttling to detect congested points and notify source server nodes to reduce the injection rate of congesting flows, and congestion isolation (as defined in the IEEE 802.1Qcz standard) that stores the congesting flows in separate queues or virtual channels (VCs) at switch buffers. Unfortunately, these solutions have exhibited important drawbacks, such as the prohibitive latency generated by flow control during congestion situations, the slow and ineffective response of injection throttling, or the excessive resources required to identify and isolate congesting flows. In this paper, we propose a hybrid congestion management solution, called 3SC (from three strategies combined), which combines end-to-end flow control, injection throttling, and congesting-flow isolation. 3SC significantly reduces Head-of-Line (HoL) blocking by isolating congesting flows in special queues at some switches, swiftly throttles the injection of these congesting flows, and significantly decreases the number of flow control messages compared to PFC. To evaluate our proposal, we have conducted a large set of simulation experiments for different network configurations and realistic traffic patterns. The results demonstrate that 3SC is efficient and feasible, making it a promising solution for future interconnection network designs.