• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

TGIF problem - median of 5

 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In discussing this problem I looked up some of the possible solutions, and discovered a "median of medians" algorithm that intrigued me.

It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?

However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:
What is the most efficient way of determining the median of specifically a group of 5 values?

I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult to prove to myself that my solution works empirically.

Winston
 
Ranch Hand
Posts: 1164
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:In discussing this problem I looked up some of the possible solutions, and discovered a "median of medians" algorithm that intrigued me.

It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?



My first question too is why only groups of 5? Why not 3 or 7 or 9 or 11 or whatever? What is the reason for having 5 elements in a group?

Winston Gutkowski wrote:However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:
What is the most efficient way of determining the median of specifically a group of 5 values?

I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult to prove to myself that my solution works empirically.

Winston




As far as I can remember from my school days, a median is the value that lies midway between a list of sorted values. So, each sub group needs to be sorted, its median found and then a continuous group of such medians is to be formed. Then the median of these medians is to be found. This would be the pivot element. And you are attempting to do that in the least possible time. Am I correct? What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later? In other words, what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mansukhdeep Thind wrote:And you are attempting to do that in the least possible time. Am I correct?


Yup, and since the base group is 5, it makes sense to start with that problem; and it's not as easy as it looks.

What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later?


Because partition() does a lot of work itself, so it makes sense to make it as effective as possible. If you're unlucky enough to choose your pivots badly, the same element can get swapped many times by Quicksort (hence the worst case time of O(n^2)). The "median of medians, grouped by 5" is also extremely fast, since at each stage of recursion you're reducing by a power of 5 rather than the usual 2.

Mansukhdeep Thind wrote:what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?


Well, according to the Wiki page, it improves worst-case time to O(n*log(n)) - although they also say that, in practice, it generally doesn't make much difference.

I'm also darn sure that 5 wasn't chosen by accident, and that it probably gives the best "bang for the buck" in terms of efficiency; but exactly why I honestly don't know.

Winston
 
Ranch Hand
Posts: 117
Mac Chrome Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Looks like having blocks of 3 would not guarantee O(n):

The reason is that by choosing blocks of 3, we might lose the guarantee of having an O(n) time algorithm.

For blocks of 5, the time complexity is

T(n) = T(n/5) + T(7n/10) + O(n)

For blocks of 3, it comes out to be

T(n) = T(n/3) + T(2n/3) + O(n)

Check this out: http://www.cs.berkeley.edu/~luca/w4231/fall99/slides/l3.pdf



Source: http://stackoverflow.com/questions/3908073/optimal-median-of-medians-selection-3-element-blocks-vs-5-element-blocks

What is the most efficient way of determining the median of specifically a group of 5 values?



You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.

 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Mrozik wrote:You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.


Possibly, but I suspect not. Quickselect, which finds the kth element of n, only needs to do a partial sort (the "partition" method we've been talking about), since you already know which partition the result should be in. The problem is that the first partition of 5 requires 4 comparisons and at least 2 swaps. I'm trying to work out if there's a better alternative than Quickselect for specifically 5 items.

Just a fun little problem that I've never really thought about before.

Winston
 
reply
    Bookmark Topic Watch Topic
  • New Topic