欢迎来到留学生英语论文网

当前位置:首页 > 论文范文 > Information Systems

MapReduce System Analysis

发布时间:2018-06-12
该论文是我们的学员投稿,并非我们专家级的写作水平!如果你有论文作业写作指导需求请联系我们的客服人员

CHAPTER 4

SYSTEM ANALYSIS

4.1 Existing System

The original MapReduce execution in Google and Dryad utilize the same speculative execution system. They start speculative execution just when the map or the reduce stage is near to finish. At that point they choose a subjective set of the remaining tasks to move down the length of openings are accessible, and mark a task’s assignment as finished when one of the assignment of task’s process finishes. This method is exceptionally straightforward and also, instinctive. Then again, they don't consider the accompanying questions such as, i) Are those remaining assignments truly moderate, or do they simply have more data to process? ii) Whether the worker node decided to run a backup task undertaken is quick or not? iii) Could the backup task undertaken finish before the original one?

A Hadoop-Original enhances this system by utilizing the progress of an undertaking and begins the speculative execution when a job has no new map or reduce task to allot. It just recognizes an assignment as a straggler when the errand's progress falls behind the normal progress of all undertakings by a SPECULATIVE GAP (i.e., 0.2). Nonetheless, LATE finds that Hadoop-Original can be misdirecting in heterogeneous situations and consequently makes a few changes. It keeps the progress rate of assignments and appraisals their remaining time. Assignments with their progress rate beneath slowTaskThreshold are picked as backup competitors, among whom the one with the largest remaining time is provided the highest priority to be moved down. Furthermore, LATE considers a worker node to be moderate on the off chance that its performance score (the aggregate progress or the normal progress rate of all the succeeded and running assignments on it) is beneath the slowNodeThreshold. It will never dispatch any theoretical assignment on these moderate worker nodes. Moreover, LATE constrains the quantity of backup assignments by speculativeCap. Contrasted and Hadoop-Original, it manages the issues being referred to i) and ii), yet at the same time has a few issues also. Hadoop-LATE is a usage of the LATE strategy in Hadoop-0.21. It supplants the slowTaskThreshold and slowNodeThreshold with the STD (standard deviation) of all task’s progress rate. The basis is to let the STD change the thresholds naturally. Then again, this may even now bring about misinterpretation as we will see later.

4.1.1Pitfalls in the previous work

  1. Pitfalls in Selecting Backup Candidates

A Hadoop-LATE and LATE utilize the normal progress rate to choose moderate assignments and evaluation their remaining time. They are in view of the accompanying suspicions:

ï‚· The tasks of the same sort (map or reduce) transform generally the same measure of data.

ï‚· Progress rate should either be stable or accelerate amid a task’s execution time.

In the accompanying, we exhibit a few situations where those suppositions separate.

  1. Pitfalls in Selecting Backup Worker Nodes
  1. Identifying Slow Worker Nodes Improperly

LATE and Hadoop-LATE utilize a threshold (e.g., slowNodeThreshold) to character the straggler nodes. LATE uses the whole progress of all the finished and running task processes on a worker node to speak to the performance progress rate of the node, while Hadoop-LATE uses the usual progress rate of all the finished tasks on the node. The usual progress rate and performance progress rate both consider a worker node as an average when the performance progress rate of the node is not exactly the normal performance score of every node by a threshold, and will never dispatch any speculative undertaking on this moderate node.

Then again, some worker nodes may accomplish more time-devouring assignments and get lower performance score unjustifiably. Case in point, they may accomplish more process executions with a bigger measure of data to process or they may accomplish more non-nearby map assignments. Subsequently, such worker nodes are thought to be moderate by oversight.

  1. Choosing Backup Worker Nodes Improperly

Neither LATE nor Hadoop-LATE uses data locality to check whether backup assignments can complete prior when picking backup nodes. They expect that network usage is adequately low amid the map stage in light of the fact that most map undertakings are data-local. Therefore, they accept that non-local map tasks can keep running as quickly as data-local map assignments. Notwithstanding with this suspicion can separate effectively such as, i) In a MapReduce cluster where various jobs are running all the while, the network bandwidth may be completely used in light of the fact that different jobs are caught up with copying map yields to reduce tasks or composing the last yields of reduce processes to some steady file system, ii) Reduce assignments will copy map yields simultaneously alongside the execution of map processes, prompting bandwidth rivalry. The fact about the process execution is we have watched that the execution time of a data-local map assignment can be more than three times speedier than that of a non-local map task execution, spurring us to consider data locality in our answer.

4.2 Proposed System

The proposed system of this project is when a job is submitted to a MapReduce cluster, some of the nodes may be running very slow due to process overloading or hardware inefficiency. When multiple numbers of tasks are submitted in a cluster and almost all tasks completed, but a few tasks processing very slow resulting in overall delay of the cluster. This is the problem in the process. The solution is to run those slow running tasks on other slave node in the cluster to get better performance.

In the project we propose a new speculative execution strategy for maximum cost performance. Consider the cost to be the computing resources occupied by tasks, while the performance to be the shortening of job execution time and the increase of the cluster throughput. The speculative execution strategy aims at selecting straggler tasks (slow running tasks in the cluster) accurately and promptly and backing up them upon appropriate fast worker nodes. To guarantee decency, this relegates undertaking spaces in the order the jobs are submitted. Much the same as other speculative execution strategies, this theoretical execution strategy gives new undertakings a higher priority than backup assignments. As it were, this theory strategy won't begin going down straggler map/reduce assignments until all new map/reduce undertakings of this job have been allotted. This theoretical execution strategy picks backup hopefuls in light of a brief forecast of the assignments' procedure speed and an accurate estimation of their remaining time. At that point, these backup competitors will be specifically moved down on legitimate worker nodes to accomplish max cost performance according to the cluster burden.

In the project the slow running nodes are selected on the basis on EWMA (exponential weighted moving average) algorithm. Where the tasks expected process speed is calculated accurately without predicting the historical average process speed.

Then a proper backup node to be selected that will ensure the speculation is beneficial on the basis of its remaining time and backup time. The detailed explanation for this is given in the implementation section.

The speculative execution decision is taken to compare the profit of backup tasks over non backup task. If the profit of task backup is more than non-task backup, then speculation will take place and hence will result in transferring the slow running task on a straggler machine to alternative machine in the cluster.

上一篇:Types of side channel attacks' effect on circuit security 下一篇:返回列表