학술논문

CloudSentry: Two-Stage Heavy Hitter Detection for Cloud-Scale Gateway Overload Protection
Document Type
Periodical
Source
IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 35(4):616-633 Apr, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Cloud computing
Logic gates
Software
Scalability
Hardware
Throughput
Production
CPU Spikes
cloud network
heavy hitter detection
Language
ISSN
1045-9219
1558-2183
2161-9883
Abstract
The cloud vendors provide sharing resources for millions of tenants across the world to achieve economies of scale. At the same time, the cloud network keeps the performance isolation between different tenants as if they use their private dedicated resources. However, heavy hitters caused by a single tenant at cloud gateways will break such isolation, undermining the predictable performance expected by other cloud tenants. To prevent it, heavy hitter detection becomes a key concern at the performance-critical cloud gateways but faces the dilemma between fine granularity and low overhead. In this work, we present CloudSentry , a scalable two-stage heavy hitter detection system dedicated to multi-tenant cloud gateways against such a dilemma. CloudSentry uses CPU utilization as an indicator of heavy hitters and conducts a lightweight coarse-grained detection running 24/7 to detect such CPU spikes. Then it invokes a fine-grained detection to precisely dump and analyze the potential heavy-hitter packets at the CPU spikes. After that, a more comprehensive analysis is conducted to associate heavy hitters with the cloud service scenarios and invoke a corresponding backpressure procedure. CloudSentry significantly reduces memory, computation and storage overhead compared with existing approaches. In a gateway cluster under an average traffic throughput of 251 Gbps, CloudSentry consumes only a fraction of 2%–5% CPU utilization with 8 KB run-time memory, producing only 10 MB heavy hitter logs during one month. Additionally, as it has been deployed in Alibaba Cloud for over two years, we share case studies and a lot of deployment experiences in this article.