问题描述:

Does stateless node mean just being independent of each others? can you explain this concept w.r.t to hadoop

网友答案:

The explanation can be as follows: each mapper/reducer has no idea about all the other mappers/reducers (i.e. about their current states, their particular outputs if any, etc.). Such statelessness is not great for certain data processing workloads (e.g. graph data) but allows easy parallelization (a particular map/reduce task can be run on any node, meaning a failed mapper/reducer is not an issue, just start a new one on the same input split/mappers' outputs).

网友答案:

I would say that statefulness of the nodes in computing infrastructures has slightly different meaning from what you have defined. Remember there is always coordination process running somewhere, so there is no complete independence between the nodes.

What it can actually mean in computing infrastructures is that the nodes does not store anything about the computation they are performing on persistent storage. Consider the following, you have master running on some machine delegating the tasks to the workers, the workers maintain the information in RAM and retrieve it from RAM when necessary for task computation. Workers also write results into RAM. You can consider the worker nodes as stateless, since whenever the worker node fails (from power cut for example) it would not have any mechanism which would allow it to recover the execution from the point it has stopped at. But still master will know that the node has failed and would delegate the task to another machine in the cluster.

Regarding Hadoop, the architecture is statefull, first of all, because whenever the job is starting its execution it will transfer all the metadata to the worker node (the jar file, split location, etc). Secondly, when the job is scheduled on the node which does not contain the input data, it will be transferred there. Additionally, the intermediate data is being stored on the disk, exactly for failure recovery reasons, so the failure recovery mechanisms can resume the job from the point where execution has stopped.

相关阅读:
Top