Win a copy of Functional Reactive Programming this week in the Other Languages forum!

# TGIF problem - median of 5

Winston Gutkowski
Bartender
Posts: 10527
64
In discussing this problem I looked up some of the possible solutions, and discovered a "median of medians" algorithm that intrigued me.

It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?

However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:
What is the most efficient way of determining the median of specifically a group of 5 values?

I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult to prove to myself that my solution works empirically.

Winston

Mansukhdeep Thind
Ranch Hand
Posts: 1158
Winston Gutkowski wrote:In discussing this problem I looked up some of the possible solutions, and discovered a "median of medians" algorithm that intrigued me.

It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?

My first question too is why only groups of 5? Why not 3 or 7 or 9 or 11 or whatever? What is the reason for having 5 elements in a group?

Winston Gutkowski wrote:However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:
What is the most efficient way of determining the median of specifically a group of 5 values?

I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult to prove to myself that my solution works empirically.

Winston

As far as I can remember from my school days, a median is the value that lies midway between a list of sorted values. So, each sub group needs to be sorted, its median found and then a continuous group of such medians is to be formed. Then the median of these medians is to be found. This would be the pivot element. And you are attempting to do that in the least possible time. Am I correct? What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later? In other words, what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?

Winston Gutkowski
Bartender
Posts: 10527
64
Mansukhdeep Thind wrote:And you are attempting to do that in the least possible time. Am I correct?

Yup, and since the base group is 5, it makes sense to start with that problem; and it's not as easy as it looks.

What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later?

Because partition() does a lot of work itself, so it makes sense to make it as effective as possible. If you're unlucky enough to choose your pivots badly, the same element can get swapped many times by Quicksort (hence the worst case time of O(n^2)). The "median of medians, grouped by 5" is also extremely fast, since at each stage of recursion you're reducing by a power of 5 rather than the usual 2.

Mansukhdeep Thind wrote:what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?

Well, according to the Wiki page, it improves worst-case time to O(n*log(n)) - although they also say that, in practice, it generally doesn't make much difference.

I'm also darn sure that 5 wasn't chosen by accident, and that it probably gives the best "bang for the buck" in terms of efficiency; but exactly why I honestly don't know.

Winston

Paul Mrozik
Ranch Hand
Posts: 117
Looks like having blocks of 3 would not guarantee O(n):

The reason is that by choosing blocks of 3, we might lose the guarantee of having an O(n) time algorithm.

For blocks of 5, the time complexity is

T(n) = T(n/5) + T(7n/10) + O(n)

For blocks of 3, it comes out to be

T(n) = T(n/3) + T(2n/3) + O(n)

Check this out: http://www.cs.berkeley.edu/~luca/w4231/fall99/slides/l3.pdf

Source: http://stackoverflow.com/questions/3908073/optimal-median-of-medians-selection-3-element-blocks-vs-5-element-blocks

What is the most efficient way of determining the median of specifically a group of 5 values?

You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.

Winston Gutkowski
Bartender
Posts: 10527
64
Paul Mrozik wrote:You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.

Possibly, but I suspect not. Quickselect, which finds the kth element of n, only needs to do a partial sort (the "partition" method we've been talking about), since you already know which partition the result should be in. The problem is that the first partition of 5 requires 4 comparisons and at least 2 swaps. I'm trying to work out if there's a better alternative than Quickselect for specifically 5 items.

Just a fun little problem that I've never really thought about before.

Winston