Share

Export Citation

APA
MLA
Chicago
Harvard
Vancouver
BIBTEX
RIS
Universitas Hasanuddin
Research output:Contribution to journalArticlepeer-review

Performance analysis of big data frameworks on virtualized clusters

Ilham A.A.

Proceedings of the 3rd International Conference on Informatics and Computing Icic 2018

Published: 2018

Abstract

Research on Big Data applications has become increasingly important for institutions and researchers worldwide. This trend is triggered by the increasingly use of systems and devices that leads to generate massive of electronic data each day. The implementation of conventional algorithms has been considered to be less efficient on managing and processing large datasets. In Big Data computation, Hadoop and Apache Spark are two open source frameworks that are commonly used and run on physical clusters. Since running these frameworks on a physical cluster costs more energy and rigid in management, in this research we evaluated their performance on virtualized clusters. Virtualization technology offers flexibility on managing cluster by sharing the resources to multiple instances. Our experiments show that in general Apache Spark is about 2-9 times better in execution time and throughput compared with Hadoop running on a virtualized environment.

Access to Document

10.1109/IAC.2018.8780502

Other files and links

Fingerprint

SPARK (programming language)Sciences
Big dataSciences
Computer scienceSciences
VirtualizationSciences
ThroughputSciences
Flexibility (engineering)Sciences
Cluster (spacecraft)Sciences
Computer clusterSciences
Distributed computingSciences
Operating systemSciences
ComputationSciences
Cloud computingSciences
WirelessSciences
StatisticsSciences
Programming languageSciences
MathematicsSciences
AlgorithmSciences