Tutorial 0 : Hadoop Map Reduce Partitionner

hadoop logo

Tutorial 0 : Hadoop Map Reduce Partitionner

  • At the beginning we will start with a simple hadoop job. Suppose that we have a big file that contains many words sperated by a white space, and we want to get the number of appearance of each word. Also we need that the words from [A-L] will be in the first part and the others in the second part.

 

  • Let’s start with hadoop installation by folling this link .

 

    • Then you should start hadoop daemon by invoking this scripts:

 

  • Ok, one last step before starting, you need to copy the input files into your locale hadoop file system and create some directories in hdfs before copying.

 

  • So download the two input file (a small file just for testing) : download link

 

    • After that, create paths in hdfs by invoking :

 

    • Then, copy them to hdfs by invoking a command like this:

for example if you downloaded the files into Downloads/lab0/inputs/, then the command line should be:

 

  • First you should create a Job class that extends Configured class and implements Tool interface. By writing this class you will give the job all the information about the input format, output format, the mapper, the reducer, the key and value output format of mapper and reducer etc …

 

  • Let’s understand how mapper works.

 

  • In our case the role of the mapper is to write 1 as a value for each word (as a key ).

 

 

  • Now let’s have a look at the partitioner, it should extends Partitioner<MapperOutPutKeyType, MapperOutPutValueType<.

 

  • In our case is to forward each key from the mapper to a specific reducer.

 

      • Now let’s have a look at the reducer, the KeyInputFormat and ValueInputFormat of the reucer should be equals to the KeyOutputFormat and ValueOutputFormat of the mapper.

 

      • In our case the role of the reducer is to sum the value for each word (key).

 

      • Export the jar as a runnable jar and specify WordCountJob as a main class, then open terminal and run the job by invoking :

 

      • for example if you give the jar the name lab0.jar then the command line will be :

 

      • You can have a look on the result by invoking :

Avatar for Ayman Ben Amor

Author: Ayman Ben Amor

1 Comment

Post a Comment

Comment
Name
Email
Website