Share
Export Citation
Performance analysis of big data frameworks on virtualized clusters
Ilham A.A.
Proceedings of the 3rd International Conference on Informatics and Computing Icic 2018
Abstract
Research on Big Data applications has become increasingly important for institutions and researchers worldwide. This trend is triggered by the increasingly use of systems and devices that leads to generate massive of electronic data each day. The implementation of conventional algorithms has been considered to be less efficient on managing and processing large datasets. In Big Data computation, Hadoop and Apache Spark are two open source frameworks that are commonly used and run on physical clusters. Since running these frameworks on a physical cluster costs more energy and rigid in management, in this research we evaluated their performance on virtualized clusters. Virtualization technology offers flexibility on managing cluster by sharing the resources to multiple instances. Our experiments show that in general Apache Spark is about 2-9 times better in execution time and throughput compared with Hadoop running on a virtualized environment.