학술논문

Data-Intensive Workflow Optimization Based on Application Task Graph Partitioning in Heterogeneous Computing Systems
Document Type
Conference
Source
2014 IEEE Fourth International Conference on Big Data and Cloud Computing Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on. :129-136 Dec, 2014
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Throughput
Partitioning algorithms
Schedules
Optimization
Computational modeling
Data transfer
Data models
Workflow optimization
Partitioning task graph
Heterogeneous computing
Stream-data processing
Language
Abstract
Stream based data processing model is proven to be an established method to optimize data-intensive applications. Data-intensive applications involve movement of huge amount of data between execution nodes that incurs large costs. Data-streaming model improves the execution performance of such applications. In the stream-based data processing model, performance is usually measured by throughput and latency. Optimization of these performance metrics in heterogeneous computing environment becomes more challenging due to the difference in the computing capacity of execution nodes and variations in the data transfer capability of communication links between these nodes. This paper presents a dual objective Partitioning based Data-intensive Workflow optimization Algorithm (PDWA) for heterogeneous computing systems. The proposed PDWA provides significantly reduced latency with increase in the throughput. In the proposed algorithm, the application task graph is partitioned such that the interpartition data movement is minimal. Such optimized partitioning enhances the throughput. Each partition is mapped to the execution node that gives minimum execution time for that particular partition. PDWA also exploits partial task duplication to reduce the latency. We evaluated the proposed algorithm with synthesized benchmarks and workflows from the real-world workloads, and the proposed algorithm shows 60% reduced latency with 47% improvement in the throughput as compared to the approach when workflows are not partitioned.