• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

question regarding the setup method

 
Aftab Hassan
Ranch Hand
Posts: 40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
I am trying to find the word which has the maximum number of characters in a text file.

Approach(I understand the word count problem)
1. Since I need something like a global variable(term used loosely) to compare words in one line(map) with those in another map, I was told that the setup method can be used to have global variables.
2. My idea was to have a variable like int maxlengthtillnow = 0; String maxlengthwordtillnow; in the setup function, and use this in each of the maps. ie, to do something like,



However, obviously this is wishful thinking since I cannot use maxlengthtillnow and maxlengthwordtillnow in the map method since these are local to the setup method.

How is this kind of requirement(where a value from one map needs to compare with another map generally achieved?
 
Surendra Kumar
Ranch Hand
Posts: 236
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You may static variables and finally override the cleanup() method to finally write the word that max length.
 
Surendra Kumar
Ranch Hand
Posts: 236
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Surendra Kumar wrote:You may static variables and finally override the cleanup() method to finally write the word that max length.

I mean to say, use map to output word and its length; and then in reduce, use static variables for max length, and compute max length of word from input, and finally write the output in cleanup() method.
 
Rajesh Nagaraju
Ranch Hand
Posts: 63
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In this approach

I mean to say, use map to output word and its length; and then in reduce, use static variables for max length, and compute max length of word from input,
and finally write the output in cleanup() method


the variables will give you the max for the particular map task assuming the configuration set for JVM re-use as by default it is 1.
you will still have to have 1 reducer and get the actual max length and the word from all the mappers.

My approach would be to have global a counter for the max length, you start with the counter value being the value of the first word.
If the word is more than the value of the counter max length, then write the word as key and its length also change the counter value to the new length.
Then,

Approach 1: Then use a single reducer to get the max. Advantage is the number of records to process in the reducer will reduce.
Approach 2: More complicated however will perform better, you use the length of the word as a key. Then use a custom partitioner to send the range of lengths to a reducer.
Then find the max in each reducer and the output of your last reducer will hold the max length and the word



 
amit punekar
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
Why can't you
1) Run the mapper that outputs "word" as key and its length as value.
2) Setting the reducer size to 1, would make sure that all mapper's output passed to a single reducer which can then look at the map and output the MAX length words out.

I do understand that I am not talking about "setup" question that you have asked. However this way you could handle it easily and in a better manner.

As someone mentioned you could use reducer as custom combiner as well (similar to the standard Weather example )

Regards,
Amit
 
Rajesh Nagaraju
Ranch Hand
Posts: 63
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
amit punekar wrote:Hello,
Why can't you
1) Run the mapper that outputs "word" as key and its length as value.
2) Setting the reducer size to 1, would make sure that all mapper's output passed to a single reducer which can then look at the map and output the MAX length words out.

I do understand that I am not talking about "setup" question that you have asked. However this way you could handle it easily and in a better manner.

As someone mentioned you could use reducer as custom combiner as well (similar to the standard Weather example )

Regards,
Amit


This is the Approach 1, I mentioned the limitation is that you have only 1 reducer which could end up with a lot of things to do and
hence affect performance. I have not mentioned a combiner as we dont need a combiner each mapper output is just the longest word
and its max length
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic