MapReduce:

MapReduce is a core component of Apache Hadoop software framework.
MapReduce serves two essential functions:

It parcels out work to various nodes within the cluster or map
It organizes and reduces the results from each node into a cohesive answer to a query.

Components of MapReduce:

JobTracker : the master node that manages all jobs and resources in a cluster
TaskTrackers : agents deployed to each machine in the cluster to run the map and reduce tasks
JobHistoryServer : a component that tracks completed jobs, and is typically deployed as a separate function or with JobTrackers

To distribute input data and collate results, MapReduce operates in parallel across massive cluster sizes. Because cluster size doesn't affect a processing job's final results, jobs can be split across almost any number of servers. MapReduce is also fault-tolerant, with each node periodically reporting its status to a master node. If a node doesn't respond as expected, the master node re-assigns that piece of the job to other available nodes in the cluster. This creates resiliency and makes it practical for MapReduce to run on inexpensive commodity servers.

MapReduce handles:

Scheduling and assigns workers to map and reduce tasks.
Data Distribution by moving processes to data.
Synchronization by gathers, sorts, and shuffles intermediate data
Errors and Faults by detecting worker failures and restarts

Hadoop MapReduce

MapReduce:

results matching ""

No results matching ""