
15
Dec
Tutorial 0 : Hadoop Map Reduce Partitionner
- At the beginning we will start with a simple hadoop job. Suppose that we have a big file that contains many words sperated by a white space, and we want to get the number of appearance of each word. Also we need that the words from [A-L] will be in the first part and the others in the second part.
- Let’s start with hadoop installation by folling this link .
-
- Then you should start hadoop daemon by invoking this scripts:
1 2 |
start-dfs.sh start-yarn.sh |
- Ok, one last step before starting, you need to copy the input files into your locale hadoop file system and create some directories in hdfs before copying.
- So download the two input file (a small file just for testing) : download link
-
- After that, create paths in hdfs by invoking :
1 |
hdfs dfs -mkdir -p /training/lab0/inputs/ |
-
- Then, copy them to hdfs by invoking a command like this:
1 |
hdfs dfs -copyFromLocal ... /training/lab0/inputs/ |
for example if you downloaded the files into Downloads/lab0/inputs/, then the command line should be:
1 |
hdfs dfs -copyFromLocal ~/Downloads/lab0/inputs/* /training/lab0/inputs/ |
- First you should create a Job class that extends Configured class and implements Tool interface. By writing this class you will give the job all the information about the input format, output format, the mapper, the reducer, the key and value output format of mapper and reducer etc …
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
public class WordCountJob extends Configured implements Tool { @Override public int run(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "WordCountJob"); // Give the job the name of the main class job.setJarByClass(WordCountJob.class); // Specify the input format, which will have impact on the key and the value // type of the mapper inputs. job.setInputFormatClass(TextInputFormat.class); // By specifying TextOutputFormat as an output the format of your file will be // Key.toString()than 4 spaces (\t) than value.toString(), for example 12 14 job.setOutputFormatClass(TextOutputFormat.class); // because if we run a job and give it an output that is already exist, the job will fail TextOutputFormat.setOutputPath(job, new Path(args[1])); // specify the input paths in the hdfs TextInputFormat.setInputPaths(job, new Path(args[0])); job.setOutputFormatClass(TextOutputFormat.class); TextOutputFormat.setOutputPath(job, new Path(args[1])); // Give the job the name of the mapper class job.setMapperClass(WordCountMapper.class); // Give the job the name of the reducer class job.setReducerClass(WordCountReducer.class); // Give the job the name of the partitioner class job.setPartitionerClass(WordCountPartitioner.class); // Give the job the number of reducers // The first one will treat the words in [A,L] // The second one will treat others job.setNumReduceTasks(2); // set the key output type job.setOutputKeyClass(Text.class); // set the value output type job.setOutputValueClass(IntWritable.class); return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new WordCountJob(), new String [] { "hdfs://localhost:9000/training/lab0/inputs*", "hdfs://localhost:9000/training/lab0/output/" }); System.exit(exitCode); } } |
- Let’s understand how mapper works.
- In our case the role of the mapper is to write 1 as a value for each word (as a key ).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable ONE = new IntWritable(1); @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException { // We split by white space each line we read on the value String[] words = value.toString().split(" "); for (int i = 0; i < words.length; i++) { // write each word on the key with a 1 on the value context.write(new Text(words[i].toLowerCase()), ONE); } } } |
- Now let’s have a look at the partitioner, it should extends Partitioner<MapperOutPutKeyType, MapperOutPutValueType<.
- In our case is to forward each key from the mapper to a specific reducer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
public class WordCountPartitioner extends Partitioner<Text, IntWritable<{ @Override public int getPartition(Text key, IntWritable value, int numPartitions) { // Get the first char of the word char decider = key.toString().toUpperCase().charAt(0); char A = 'A'; char L = 'L'; // if first char in [A,L] then go to the first reducer if((A<=decider) && (decider<=L)) { return 0; // else go to the second reducer } else { return 1; } } } |
-
-
- Now let’s have a look at the reducer, the KeyInputFormat and ValueInputFormat of the reucer should be equals to the KeyOutputFormat and ValueOutputFormat of the mapper.
-
-
-
- In our case the role of the reducer is to sum the value for each word (key).
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException { int sum = 0; // We sum the values for (IntWritable value : values) { sum = sum + value.get(); } //We write the word followed by the sum context.write(key, new IntWritable(sum)); } } |
-
-
- Export the jar as a runnable jar and specify WordCountJob as a main class, then open terminal and run the job by invoking :
-
1 |
hadoop jar nameOfTheJar.jar |
-
-
- for example if you give the jar the name lab0.jar then the command line will be :
-
1 |
hadoop jar lab0.jar |
-
-
- You can have a look on the result by invoking :
-
1 |
hdfs dfs -ls /training/lab0/output |
chirurgie esthetique Tunisie : chirurgie geneve
Bonjour, ce site contient des informations très intéressantes !