File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

MapReduce Combiner without Shuffle ?

 
andre mantei
Greenhorn
Posts: 5
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi everyone,

I have a question addressing the data-flow in mapreduce-jobs.

DataFlow in a WordCount-example (taken from a book):

The mapper produces key-value-pairs like (car, 1)
maybe a lot of this pairs with the same key (car, 1)

The shuffle-phase produces key-ListOfValues-pairs like (car, 1, 1, 1)

The reducer summarizes the List and produces a key-value-pair like (car, 3)

When I want to use a combiner, I can use the reducer as a combiner (regarding to
the book, I've been learning from).
But how is this possible ? When I want to use the reducer as a combiner, there has to be a
shuffle-phase before the combiner, right ? Without a shuffle-phase, there is no
List of 1's and the combiner could not sum the value for a specific key.

Clearly I am missing something, can someone please explain it to me ?

Greetings, Andre
 
Rajesh Nagaraju
Ranch Hand
Posts: 63
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The book would say it is good to use the same code as the reducer and not use the reducer itself.

When a mapper starts outputting data it first stores it in a circular buffer, when the circular buffer
reaches a threshold (configurable) it starts to spill it the disk.

The combiner is not a reducer as it runs on the mapper itself, what a combiner does is it combines the mapper
spills. The combiner is not guaranteed to run, it needs a minium number of spill files (again configurable). It also need
not run once based on the number of spill files the combiner can run multiple time

Hope that helps
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic