Tutorial 0 : Hadoop Map Reduce Partitionner

Tutorial 0 : Hadoop Map Reduce Partitionner

  • At the beginning we will start with a simple hadoop job. Suppose that we have a big file that contains many words sperated by a white space, and we want to get the number of apperance of each word. Also we need that the words from [A-L] will be in the first part and the others in the second part.


  • I hear you saying why map reduce I can do it in a sequence java program, ok then how much time does it take to get the result from a file higher then 4GB for example …


  • Let’s start with hadoop installation by folling this link .


    • Then you should start hadoop daemon by invoking this scripts:


  • Ok, one last step before starting, you need to copy the input files into your locale hadoop file system and create some directories in hdfs before copying.


  • So download the two input file (a small file just for testing) : download link


    • After that, create paths in hdfs by invoking :


    • Then, copy them to hdfs by invoking a command like this:

for example if you downloaded the files into Downloads/lab0/inputs/, then the command line should be:


  • Now that everything is already setup, let’s start coding, First you should create a Job class that extends Configured class and implements Tool interface. By writing this class you will give the job all the information about the input format, output format, the mapper, the reducer, the key and value output format of mapper and reducer etc …


  • Now let’s have a look at the mapper, well before digging into codes a small explanation will be better.


  • In our case the role of the mapper is to write 1 as a value for each word (as a key ).



  • Now let’s have a look at the partitioner, it should extends Partitioner<MapperOutPutKeyType, MapperOutPutValueType<.


  • In our case is to forward each key from the mapper to a specific reducer.


      • Now let’s have a look at the reducer, the KeyInputFormat and ValueInputFormat of the reucer should be equals to the KeyOutputFormat and ValueOutputFormat of the mapper.


      • In our case the role of the reducer is to sum the value for each word (key).


      • Export the jar as a runnable jar and specify WordCountJob as a main class, then open terminal and run the job by invoking :


      • for example if you give the jar the name lab0.jar then the command line will be :


      • You can have a look on the result by invoking :

Author: Ayman Ben Amor

1 Comment

Post a Comment