• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Sorting using Mapreduce

 
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I want to implement distributed sorting using Map reduce in Java. From whatever I know, Merge sort will look better for distributed sorting. Are there are any other algorithms that I could use? Please guide.

Thanks,
Madhu
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Note that using Mapreduce will involve two separate phases and thus two algorithm choices.

Obviously the phase which collects and merges separate sort jobs will have to be a merge but the separate sort jobs will operate just like any other sort with the same considerations.

Bill
 
Madhumitha Baskaran
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks.

I have some questions popping out.

I am not able to find what is the exact key over here. Out of Map function, if I need to find separate keys, it is not possible. Because if I have large amount of integers to be sorted, then key should be the same for everything for me to merge it together later at reduce phase.

Shall I do something like.. I will go through the input data set and picks few keys randomly. I will make all the integers lesser than a key to get associated with that key. I will sort each group of a key using Map function. Later I will just merge everything in the reduce phase depending on the sorting order of keys. Will this work out well?

I dont know how to use Mapreduce efficiently for large data set. Can you help me out.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't understand the function of "keys" in your example - sounds like an extra step for no gain.

I'm not an expert on Mapreduce but if this was my problem I would cut the data set into roughly equal portions, one for each sorting processor/thread/whatever, then merge by looking at the "top" element of each of the returned sorted arrays, take the lowest (or highest), remove it and repeat till all sources are exhausted.

Bill
 
reply
    Bookmark Topic Watch Topic
  • New Topic