학술논문

Flow Classification for Software-Defined Data Centers Using Stream Mining
Document Type
Periodical
Source
IEEE Transactions on Services Computing IEEE Trans. Serv. Comput. Services Computing, IEEE Transactions on. 12(1):105-116 Jan, 2019
Subject
Computing and Processing
General Topics for Engineers
Data mining
Bandwidth
Hardware
Predictive models
Routing
Data models
Mice
Flow classification
streaming mining
software defined data center networks
Language
ISSN
1939-1374
2372-0204
Abstract
Traffic management is known to be important to effectively utilize the high bandwidth provided by datacenters. Recent works have focused on identifying elephant flows and rerouting them to improve network utilization. These approaches however require either a significant monitoring overhead or hardware/end-host modifications. In this paper, we propose FlowSeer, a fast, low-overhead elephant flow detection and scheduling system using data stream mining. Our key idea is that the features from flows’ first few packets allow us to train the streaming classification models that can accurately and quickly predict the rate and duration of any initiated flow. With these predicted information, FlowSeercan adapt routing polices of elephant flows to their demands and dynamic network conditions. Another nice property of FlowSeeris its capability of enabling the controller and switches to perform cooperative prediction. Most of decisions can be made by switches locally, thereby reducing both detection latency and signaling overhead. FlowSeerrequires less than 100 flow table entries at each switch to enable cooperative prediction, and hence can be implemented on off-the-shelf switches. The evaluation via both experiments in realistic virtual networks and trace-driven simulations shows that FlowSeerimproves the throughput by multiple times over Hedera, which pulls flow statistics, and performs comparably to Mahout, which needs end-host modification.