학술논문

Research the Data Analysis and Processing between MapReduce and Spark

Document Type

Conference

Author

Source

2016 International Conference on Computational Science and Computational Intelligence (CSCI) CSCI Computational Science and Computational Intelligence (CSCI), 2016 International Conference on. :1401-1402 Dec, 2016

Subject

Computing and Processing
Sparks
Big Data
Computational modeling
Data analysis
Programming
Data models
Hadoop
MapReduce
Spark
PageRank
Reducer
WordCount

Language

Abstract

Big Data can be defined as large data sets which are being generated from different sources like social media, audios, imaging, logging online websites etc. A need exists to process and analyze this huge amount of data to extract meaningful information. This can be a challenging task. Big data exceeds the processing capability of traditional databases to capture, manage, and process the voluminous amount of data. The use of the MapReduce and Spark frameworks are two common approaches that perform data analytics on Big Data. Both frameworks are open source and capable of cluster computing and fault-tolerance. We perform an experimental evaluation to study the performance differences between MapReduce and Spark. Given the conditions of our implementation, we found significant performance differences. We measured the execution times of the common WordCount program, and the PageRank algorithm, on a single node setup. Another metric of evaluation used is the CPU utilization when processing both batch and iterative jobs.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송