MapReduce:

MapReduce is a core component of Apache Hadoop software framework.
MapReduce serves two essential functions:

  • It parcels out work to various nodes within the cluster or map
  • It organizes and reduces the results from each node into a cohesive answer to a query.

Components of MapReduce:

  • JobTracker : the master node that manages all jobs and resources in a cluster

  • TaskTrackers : agents deployed to each machine in the cluster to run the map and reduce tasks

  • JobHistoryServer : a component that tracks completed jobs, and is typically deployed as a separate function or with JobTrackers

To distribute input data and collate results, MapReduce operates in parallel across massive cluster sizes. Because cluster size doesn't affect a processing job's final results, jobs can be split across almost any number of servers. MapReduce is also fault-tolerant, with each node periodically reporting its status to a master node. If a node doesn't respond as expected, the master node re-assigns that piece of the job to other available nodes in the cluster. This creates resiliency and makes it practical for MapReduce to run on inexpensive commodity servers.

MapReduce handles:

  • Scheduling and assigns workers to map and reduce tasks.
  • Data Distribution by moving processes to data.
  • Synchronization by gathers, sorts, and shuffles intermediate data
  • Errors and Faults by detecting worker failures and restarts

results matching ""

    No results matching ""