MR(Map Reduce)

Introduction to Map Reduce

It's a massive parallel processing framework
In general without distributed data it's hard to do parallel processing.
Map Reduce uses Java by default
The aim of Map Reduce is to achieve data locality and Parallelism.
Spark is an alternative for Map Reduce(functionality has 80% similarity)
Hive, Sqoop, Pig, Oozie are abstract of Map Reduce.
Daemons in Map Reduce are JT and TT.
In map reduce we also have two functionalities called Mapper and Reducer which are responsible for achieving the processing.
The MR has an disadvantage of Resource Pressure.

The task for the MR is given in the form of ZAR(java) file to the Job Tracker.
The Job Tracker will be sending the request to the Name Node and in return it gives response.
Then the Job Tracker will be sending the task information to Task Tracker.
The Task Tracker requests the data from the Data Node and Performs the Mapper Task(MAP JVM).
The Mapper Tasks are performed parallel with nearest to and when the task are done they reducer will be executing it's task in the existing node or new node.
The reducer job is performed via http protocol.
After the reducer job is finished the result is stored in local file system.
Then the result is transferred to HDFS.
The Task Tracker sends heartbeat for every 3 seconds to JT similar to HDFS and also the job status.
If there is failure of Slave then JT will send information to the NN and the task is restarted in replicated node else the job is failed.
The MAPPER AND REDUCER JVM will be sending job status to Slave Node.

Input for the mapper is given in the form of blocks.
The Storage can be HDFS, NOSQL ,RDBMS ,Any Storage Layer.
No of Mappers=No of Blocks(default, but not always).
By default TT will create 2 MAP Jars.
If the node has been assigned with more than 2 tasks then remaining tasks will be in queue.
The output is stored in local file system which is called as intermediate data.

Syntax:

class ABC

{

class mapper{}

class reducer{}

public static void main(String args[])

{}

}

When working with Key/Value Input/Output format use MR data types else use Java data types.
The mapper and reducer will be having these function: