Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to consolidate result of map-reduce

 
Anayonkar Shivalkar
Bartender
Posts: 1557
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

This might be a very basic question - but so far I'm not able to get any concrete direction.

Is there any facility in map-reduce to consolidate the output?
e.g. I have HBase cluster where I run a M-R job (actually, its just filtering from data - so I don't need reducer, it is just mapper). As of now, I'm writing this data to log file, but the problem is - since it is a cluster, the user (or process) has to browse through log files on various nodes (or hosts).

So - is it possible to do anything of below:
1) Populate a container from MR process and return the result to client.
2) Flush that data to a single resource (e.g. log file)

Due to business reasons, flushing the data to another HBase table is not possible.

Thanks in advance.
 
arumugarani sundaram
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not very sure whether this solution helps you..

we can use context.write(key,value).. which will write the mapper results into output path. If you want the results to be written into some specific format, you can write you own customized calss for output format and you can use it in you context.write.

If you want a detailed answer, please describe the sample input and output and share your Mapper class.

Thanks,
Arumugarani

 
Anayonkar Shivalkar
Bartender
Posts: 1557
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi arumugarani,

Thanks for the reply and Welcome to CodeRanch!

The problem with context.write is - as I mentioned - M-R process runs in a cluster (having multiple hosts and nodes).

So - as per my understanding - context.write will still write the data (or whatever we want to write) on file-system of each node (i.e. wherever M-R is running) - correct me if I'm wrong.

What I'm looking for is a way to consolidate all the output data at one place.

Thanks.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic