학술논문

A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers
Document Type
Periodical
Source
IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 32(2):269-280 Feb, 2021
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Throughput
Protocols
Data transfer
Data models
Bandwidth
Optimization
Real-time systems
Throughput optimization
big data transfers
offline analysis
dynamic learning
protocol tuning
Language
ISSN
1045-9219
1558-2183
2161-9883
Abstract
The amount of data transferred over dedicated and non-dedicated network links has been increasing much faster than the increase in the network capacity. On the other hand, the current data transfer solutions fail to guarantee even the promised achievable transfer throughput. In this article, we propose a novel two-phase dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online decision making. In the offline analysis, we mine historical transfer logs to perform knowledge discovery about the transfer characteristics. The online phase uses the discovered knowledge from the offline analysis along with the real-time investigation of the network condition to optimize the protocol parameters. As the real-time investigation is expensive and provides partial knowledge about the current network status, our model uses historical knowledge about the network and data characteristics to reduce the real-time investigation overhead while ensuring near-optimal throughput for each transfer. Our novel approach is tested over different networks with different datasets, and it has outperformed its closest competitor by 1.7x and the default case by 5x. It also achieved up to 93 percent accuracy compared to the optimal achievable throughput possible on those networks.