• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How can number of reducers are allowed to be customizable?

 
Vyapak Strot
Greenhorn
Posts: 2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I recently started experimenting with Hadoop. So I am a beginner.

I read that one can set the number of reducers in the code. But I don't get the point. As per my understanding each reducer is dedicated for a unique key group. So each reducer will receive all the values related to particular key (assuming default partitioner). That implies that the number of reducers required (and this will be known at run time) depends purely on the number of unique keys in the input data. This the job will know only when it starts reading the input file, and hence this information (numbe of unique keys) won't be available too in advance. If that's the case, how come programmers are allowed to set some fix number for reducers in advance?

Thanks,
Vyapak
 
Vijaypal Singh
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Vyapak,

Your understanding "So each reducer will receive all the values related to particular key (assuming default partitioner).That implies that the number of reducers required (and this will be known at run time) depends purely on the number of unique keys in the input data. " is correct .
The number of reducers need to be decided as a pert of design for many reasons or example
* Need to have fixed number of outputs generated from a Map reduce Job
* Keep the number of reducers smaller as compared to number of mappers for performance reasons
* Many similar other reasons.

So, The idea to declare/define fixed number of reducers can depend on many reasons including the ones listed above.

To add to the above with reference to your next query
"This the job will know only when it starts reading the input file, and hence this information (numbe of unique keys) won't be available too in advance If that's the case, how come programmers are allowed to set some fix number for reducers in advance? "

The number of reducers to be used is kind of design time decision, the real divide and conquer happens in the Map phase which is actually governed by the input split.

-hope it answers your query
 
Monica Shiralkar
Ranch Hand
Posts: 873
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The number of reducers need to be decided as a pert of design for many reasons or example



If the number of reducers = The number of unique keys in the input data

Then how can "The number of reducers need to be decided as a pert of design for many reasons or example"
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Monica Shiralkar wrote:
The number of reducers need to be decided as a pert of design for many reasons or example


If the number of reducers = The number of unique keys in the input data

It doesn't. You might have 1000 unique keys, and 10 reducers, so each reducer might handle 100 keys on average.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic