This week's book giveaway is in the OO, Patterns, UML and Refactoring forum.
We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line!
See this thread for details.
The moose likes Hadoop and the fly likes MapReduce Combiner without Shuffle ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

JavaRanch » Java Forums » Databases » Hadoop
Bookmark "MapReduce Combiner without Shuffle ?" Watch "MapReduce Combiner without Shuffle ?" New topic

MapReduce Combiner without Shuffle ?

andre mantei

Joined: Sep 12, 2013
Posts: 5
Hi everyone,

I have a question addressing the data-flow in mapreduce-jobs.

DataFlow in a WordCount-example (taken from a book):

The mapper produces key-value-pairs like (car, 1)
maybe a lot of this pairs with the same key (car, 1)

The shuffle-phase produces key-ListOfValues-pairs like (car, 1, 1, 1)

The reducer summarizes the List and produces a key-value-pair like (car, 3)

When I want to use a combiner, I can use the reducer as a combiner (regarding to
the book, I've been learning from).
But how is this possible ? When I want to use the reducer as a combiner, there has to be a
shuffle-phase before the combiner, right ? Without a shuffle-phase, there is no
List of 1's and the combiner could not sum the value for a specific key.

Clearly I am missing something, can someone please explain it to me ?

Greetings, Andre
Rajesh Nagaraju
Ranch Hand

Joined: Nov 27, 2003
Posts: 63
The book would say it is good to use the same code as the reducer and not use the reducer itself.

When a mapper starts outputting data it first stores it in a circular buffer, when the circular buffer
reaches a threshold (configurable) it starts to spill it the disk.

The combiner is not a reducer as it runs on the mapper itself, what a combiner does is it combines the mapper
spills. The combiner is not guaranteed to run, it needs a minium number of spill files (again configurable). It also need
not run once based on the number of spill files the combiner can run multiple time

Hope that helps
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link:
subject: MapReduce Combiner without Shuffle ?
It's not a secret anymore!