It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?

However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:

*What is the most efficient way of determining the*

**median**of__specifically__a group of 5 values?I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult to

*prove*to myself that my solution works empirically.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

Winston Gutkowski wrote:In discussing this problem I looked up some of the possible solutions, and discovered a "median of medians" algorithm that intrigued me.

It's basis is to divide the dataset into groups of 5 and calculate the median of each group, so my first thought was: why 5? Since the next logical set for a median would be 11, I wonder if this is a throwback to the days when machines only had 8 registers available? If anybody has any other ideas, I'd love to know. And is there any advantage of using 5 over 7 as the base?

My first question too is why only groups of 5? Why not 3 or 7 or 9 or 11 or whatever? What is the reason for having 5 elements in a group?

Winston Gutkowski wrote:However, that aside, it got me thinking; because it's a problem that (oddly enough) I've never had to deal with:

What is the most efficient way of determining themedianofspecificallya group of 5 values?

I've been mulling it over for a few hours and think I'm pretty close to the best (mine involves at most 6 comparisons); but I'm interested in other suggestions, because I've been known to miss the obvious. I'm also finding it darn difficult toproveto myself that my solution works empirically.

Winston

As far as I can remember from my school days, a median is the value that lies midway between a list of sorted values. So, each sub group needs to be sorted, its median found and then a continuous group of such medians is to be formed. Then the median of these medians is to be found. This would be the pivot element. And you are attempting to do that in the least possible time. Am I correct? What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later? In other words, what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?

~ Mansukh

Mansukhdeep Thind wrote:And you are attempting to do that in the least possible time. Am I correct?

Yup, and since the base group is 5, it makes sense to start with that problem; and it's

*not*as easy as it looks.

What is the objective of doing so much work just to find a "median of medians" which would only serve to be the "pivot" for the partition algorithm later?

Because partition() does a

*lot*of work itself, so it makes sense to make it as effective as possible. If you're unlucky enough to choose your pivots badly, the same element can get swapped many times by Quicksort (hence the worst case time of O(n^2)). The "median of medians, grouped by 5" is also extremely fast, since at each stage of recursion you're reducing by a power of 5 rather than the usual 2.

Mansukhdeep Thind wrote:what I am asking is what does this algorithm achieve over and above the one that I implemented in my code?

Well, according to the Wiki page, it improves

*worst-case*time to O(n*log(n)) - although they also say that, in practice, it generally doesn't make much difference.

I'm also darn sure that 5 wasn't chosen by accident, and that it probably gives the best "bang for the buck" in terms of efficiency; but exactly why I honestly don't know.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

The reason is that by choosing blocks of 3, we might lose the guarantee of having an O(n) time algorithm.

For blocks of 5, the time complexity is

T(n) = T(n/5) + T(7n/10) + O(n)

For blocks of 3, it comes out to be

T(n) = T(n/3) + T(2n/3) + O(n)

Check this out: http://www.cs.berkeley.edu/~luca/w4231/fall99/slides/l3.pdf

Source: http://stackoverflow.com/questions/3908073/optimal-median-of-medians-selection-3-element-blocks-vs-5-element-blocks

What is the most efficient way of determining the median of specifically a group of 5 values?

You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.

Paul Mrozik wrote:You are, in fact, asking about the most efficient way to sort five elements, right? As the median will always be the third value in a sorted set of 5.

Possibly, but I suspect not. Quickselect, which finds the kth element of n, only needs to do a

*partial*sort (the "partition" method we've been talking about), since you already know which partition the result should be in. The problem is that the

*first*partition of 5 requires 4 comparisons and at least 2 swaps. I'm trying to work out if there's a better alternative than Quickselect for specifically 5 items.

Just a fun little problem that I've never really thought about before.

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here