학술논문

Research the Data Analysis and Processing between MapReduce and Spark
Document Type
Conference
Source
2016 International Conference on Computational Science and Computational Intelligence (CSCI) CSCI Computational Science and Computational Intelligence (CSCI), 2016 International Conference on. :1401-1402 Dec, 2016
Subject
Computing and Processing
Sparks
Big Data
Computational modeling
Data analysis
Programming
Data models
Hadoop
MapReduce
Spark
PageRank
Reducer
WordCount
Language
Abstract
Big Data can be defined as large data sets which are being generated from different sources like social media, audios, imaging, logging online websites etc. A need exists to process and analyze this huge amount of data to extract meaningful information. This can be a challenging task. Big data exceeds the processing capability of traditional databases to capture, manage, and process the voluminous amount of data. The use of the MapReduce and Spark frameworks are two common approaches that perform data analytics on Big Data. Both frameworks are open source and capable of cluster computing and fault-tolerance. We perform an experimental evaluation to study the performance differences between MapReduce and Spark. Given the conditions of our implementation, we found significant performance differences. We measured the execution times of the common WordCount program, and the PageRank algorithm, on a single node setup. Another metric of evaluation used is the CPU utilization when processing both batch and iterative jobs.