• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to consolidate result of map-reduce

 
Bartender
Posts: 1558
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

This might be a very basic question - but so far I'm not able to get any concrete direction.

Is there any facility in map-reduce to consolidate the output?
e.g. I have HBase cluster where I run a M-R job (actually, its just filtering from data - so I don't need reducer, it is just mapper). As of now, I'm writing this data to log file, but the problem is - since it is a cluster, the user (or process) has to browse through log files on various nodes (or hosts).

So - is it possible to do anything of below:
1) Populate a container from MR process and return the result to client.
2) Flush that data to a single resource (e.g. log file)

Due to business reasons, flushing the data to another HBase table is not possible.

Thanks in advance.
 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not very sure whether this solution helps you..

we can use context.write(key,value).. which will write the mapper results into output path. If you want the results to be written into some specific format, you can write you own customized calss for output format and you can use it in you context.write.

If you want a detailed answer, please describe the sample input and output and share your Mapper class.

Thanks,
Arumugarani

 
Anayonkar Shivalkar
Bartender
Posts: 1558
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi arumugarani,

Thanks for the reply and Welcome to CodeRanch!

The problem with context.write is - as I mentioned - M-R process runs in a cluster (having multiple hosts and nodes).

So - as per my understanding - context.write will still write the data (or whatever we want to write) on file-system of each node (i.e. wherever M-R is running) - correct me if I'm wrong.

What I'm looking for is a way to consolidate all the output data at one place.

Thanks.
 
Space pants. Tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic