Type of Document Dissertation Author Wang, Guanying Author's Email Address email@example.com URN etd-08282012-152556 Title Evaluating MapReduce System Performance: A Simulation Approach Degree PhD Department Computer Science Advisory Committee
Advisor Name Title Butt, Ali R. A. Committee Chair Cameron, Kirk W. Committee Member Feng, Wu-Chun Committee Member Nikolopoulos, Dimitrios S. Committee Member Pandey, Prashant Committee Member Keywords
- performance prediction
- performance modeling
Date of Defense 2012-08-27 Availability unrestricted AbstractScale of data generated and processed is exploding in the Big Data era. The MapReduce system popularized by open-source Hadoop is a powerful tool for the exploding data problem, and is widely employed in many areas involving large scale of data. In many circumstances, hypothetical MapReduce systems must be evaluated, e.g. to provision a new MapReduce system to provide certain performance goal, to upgrade a currently running system to meet increasing business demands, to evaluate novel network topology, new scheduling algorithms, or resource arrangement schemes. The traditional trial-and-error solution involves the time-consuming and costly process in which a real cluster is first built and then benchmarked. In this dissertation, we propose to simulate MapReduce systems and evaluate hypothetical MapReduce systems using simulation. This simulation approach offers significantly lower turn-around time and lower cost than experiments. Simulation cannot entirely replace experiments, but can be used as a preliminary step to reveal potential flaws and gain critical insights.
We studied MapReduce systems in detail and developed a comprehensive performance model for MapReduce, including sub-task phase level performance models for both map and reduce tasks and a model for resource contention between multiple processes running in concurrent. Based on the performance model, we developed a comprehensive simulator for MapReduce, MRPerf. MRPerf is the first full-featured MapReduce simulator. It supports both workload simulation and resource contention, and it still offers the most complete features among all MapReduce simulators to date. Using MRPerf, we conducted two case studies to evaluate scheduling algorithms in MapReduce and shared storage in MapReduce, without building real clusters.
Furthermore, in order to further integrate simulation and performance prediction into MapReduce systems and leverage predictions to improve system performance, we developed online prediction framework for MapReduce, which periodically runs simulations within a live Hadoop MapReduce system. The framework can predict task execution within a window in near future. These predictions can be used by other components in MapReduce systems in order to improve performance. Our results show that the framework can achieve high prediction accuracy and incurs negligible overhead. We present two potential use cases, prefetching and dynamic adapting scheduler.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Wang_G_D_2012.pdf 1.14 Mb 00:05:15 00:02:42 00:02:22 00:01:11 00:00:06
If you have questions or technical problems, please Contact DLA.