This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Hadoop and the fly likes MapReduce Combiner without Shuffle ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "MapReduce Combiner without Shuffle ?" Watch "MapReduce Combiner without Shuffle ?" New topic
Author

MapReduce Combiner without Shuffle ?

andre mantei
Greenhorn

Joined: Sep 12, 2013
Posts: 5
Hi everyone,

I have a question addressing the data-flow in mapreduce-jobs.

DataFlow in a WordCount-example (taken from a book):

The mapper produces key-value-pairs like (car, 1)
maybe a lot of this pairs with the same key (car, 1)

The shuffle-phase produces key-ListOfValues-pairs like (car, 1, 1, 1)

The reducer summarizes the List and produces a key-value-pair like (car, 3)

When I want to use a combiner, I can use the reducer as a combiner (regarding to
the book, I've been learning from).
But how is this possible ? When I want to use the reducer as a combiner, there has to be a
shuffle-phase before the combiner, right ? Without a shuffle-phase, there is no
List of 1's and the combiner could not sum the value for a specific key.

Clearly I am missing something, can someone please explain it to me ?

Greetings, Andre
Rajesh Nagaraju
Ranch Hand

Joined: Nov 27, 2003
Posts: 57
The book would say it is good to use the same code as the reducer and not use the reducer itself.

When a mapper starts outputting data it first stores it in a circular buffer, when the circular buffer
reaches a threshold (configurable) it starts to spill it the disk.

The combiner is not a reducer as it runs on the mapper itself, what a combiner does is it combines the mapper
spills. The combiner is not guaranteed to run, it needs a minium number of spill files (again configurable). It also need
not run once based on the number of spill files the combiner can run multiple time

Hope that helps
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: MapReduce Combiner without Shuffle ?
 
Similar Threads
Sorting using Mapreduce
Question for Authors - How does QA fit in
Getting all key-value pairs from a hashmap in a random order
How to list all visible containers
How to do novice-level RSA