~ Mansukh
Campbell Ritchie wrote:I suggest you start by working out an algorithm to find the largest element in that array without sorting it.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
~ Mansukh
~ Mansukh
Campbell Ritchie wrote:That doesn’t sound right. That sounds like a sorting algorithm. And you are still thinking at too high a level. Things like pivot sound like an implementation, not an algorithm. When you get down to saying things like “middle element”, you are more like the right level.
~ Mansukh
Campbell Ritchie wrote:You can try your algorithm, which I think is incorrect, but I am not certain.
I also think the words suggest your solving methodology (and I mean methodology not method) is also wrong. You appear (but I am not certain) to be thinking, “how do I program it?” rather than, “what do I do?”
~ Mansukh
Mansukhdeep Thind wrote:Do you mean to say my solution is incorrect or my usage of words to describe it? Or both?
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Mansukhdeep Thind wrote:This is the only way I can think of to home in on the greatest/smallest element i.e. by narrowing it down into sets. How else? Just hint at the approach and let me try again.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
~ Mansukh
Larry Frissell wrote:With that in mind, my solution (without sorting) is to consider the problem as a stack of cards with numbers on them face down.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Campbell Ritchie wrote:That doesn’t sound right. That sounds like a sorting algorithm. And you are still thinking at too high a level. Things like pivot sound like an implementation, not an algorithm. When you get down to saying things like “middle element”, you are more like the right level.
Larry Frissell wrote:As Winston said, using pencil and paper to find a solution is the best way to solve the problem. With that in mind, my solution (without sorting) is to consider the problem as a stack of cards with numbers on them face down. I can look at a card and decide to keep it or discard it. So I could keep the first three. After that each card I pick up I would then make the decision to keep the card or discard based on whether one of the cards I am holding is a lower value then the one I am looking at. After looking at each card in the stack once, I am holding only the three highest cards. Then it is simply a matter of selecting the smallest of the three.
~ Mansukh
Mike Simmons wrote:Things like pivot sound like an algorithm to me.
In fact Mansukhdeep seems to be describing the algorithm that comes to my mind for doing this in O(n) time, as opposed to O(n * log(n)), which a full sort would require. And here I am assuming that k can be proportional to n, so any part of the algorithm that depends on k also depends on n. Many naive solutions of this might be O(n*k) which is O(n^2) as far as I'm concerned.
A lot seems to depend on what is meant by "without doing a sort".
If there's also a restriction on memory, this might be a problem.
Anyway, Mansukhdeep - it sounds to me like you are on the right track, as I understand the problem.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Mansukhdeep Thind wrote:b) I will then pick up the 563rd card and compare its value with first 562 cards. Moment I find that this picked up card has a larger value than any one of existing cards in the list, I chuck that out and put this one in. Correct?
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Yes. Specifically, a sorting algorithm on the lines of Quicksort; which might help you pick out the largest or smallest quickly, but I'm pretty sure it would need to be repeated or involve lots of comparisons between groups to find the nth (or kth) largest.
~ Mansukh
Mike Simmons wrote:
Winston Gutkowski wrote:
If there's also a restriction on memory, this might be a problem.
True, although you could probably do it in place by partitioning the array into subsections of length n and n-k and swapping (although if that was the case I'd really want to do a sort on the k section first ).
Partitioning in place was used in the original algorithm Mansukhdeep described. Here I was describing alternatives in case there was a prohibition on changing the input data, which would disallow partitioning in place.
~ Mansukh
Mike Simmons wrote:
Mike Simmons wrote:
Winston Gutkowski wrote:
If there's also a restriction on memory, this might be a problem.
True, although you could probably do it in place by partitioning the array into subsections of length n and n-k and swapping (although if that was the case I'd really want to do a sort on the k section first ).
Partitioning in place was used in the original algorithm Mansukhdeep described. Here I was describing alternatives in case there was a prohibition on changing the input data, which would disallow partitioning in place.
OK, maybe it wasn't explicitly mentioned, and isn't present in the code shown so far. But we both agree it can be added, if modifying the input data is allowed.
Winston Gutkowski wrote:
Mike Simmons wrote:Things like pivot sound like an algorithm to me.
Yes. Specifically, a sorting algorithm on the lines of Quicksort; which might help you pick out the largest or smallest quickly, but I'm pretty sure it would need to be repeated or involve lots of comparisons between groups to find the nth (or kth) largest.
Winston Gutkowski wrote:
Well I don't think that's right because
1. As k decreases it would approach O(n).
Winston Gutkowski wrote:2. I doubt it ever needs to be more than O(n * n/2) since if k > n/2 you simply flip the algorithm to find the (n-k)th smallest.
Winston Gutkowski wrote:
A lot seems to depend on what is meant by "without doing a sort".
I took it to mean exactly that, ie, you cannot perform a sort in the algorithm. I don't see any restriction on using ordered structures though, so I would assume that (as you mentioned) a heap or (my preference) a Skiplist would be perfectly acceptable for holding your 'array of k'.
Winston Gutkowski wrote:You're quite right though (and I was completely wrong), my algorithm would be O(n log(k)); and I suspect you could add efficiency by keeping track of the lowest/highest of k too (ie, quick elimination).
Winston Gutkowski wrote:Fun problem though.
Mike Simmons wrote:Well, looks OK so far. What if you are looking for the 2nd greatest? Or the 5th? I think as you go on, you will need to keep track of both the items greater than the pivot, and less than the pivot - and sometimes the element you want will be in the "greater" list, and sometimes in the "less" list.
Mike Simmons wrote:I don't think it really matters that you pick a pivot from halfway through the list. What you'd like, of course, is a pivot that's halfway through the sorted list - but we have no way of knowing in advance what or where that will be. In fact it's probably advantageous to choose a pivot at random, because otherwise there will be certain inputs that give pathologically bad performance. If you pick randomly, then you will get good pivot points often enough that it will compensate for the occasional bad pivot points. If you don't pick randomly, then by knowing your algorithm, I can devise an input which will give very poor performance. This is the sort of thing that can make a website vulnerable to DOS attack.
~ Mansukh
Mansukhdeep Thind wrote:Then I would first apply Larry's methodology and isolate the k greatest/smallest elements in the random list. After that, I would apply my methodology on the filtered list to find the kth smallest / largest element.
That should do it. Is there a faster way?
What do you mean when you say "make ... vulnerable to a DOS attack"?
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Winston Gutkowski wrote:
Mansukhdeep Thind wrote:Then I would first apply Larry's methodology and isolate the k greatest/smallest elements in the random list. After that, I would apply my methodology on the filtered list to find the kth smallest / largest element.
Then why not do it that way from the get go?
Winston Gutkowski wrote:
Mansukhdeep Thind wrote:That should do it. Is there a faster way?
Not that I know of. As Mike says, for finding a specific kth largest element (as opposed to "the k largest"), your approach has expected linear time (but worse case is O(n^2)). Then you can just use that element to extract the k largest with a single final pass over the array, so the overall time will still be O(n).
However, it takes a lot more thought and is much less intuitive than Larry's and my method - so it's much easier to get wrong - and it may well perform worse on small sets and/or small values of k. It's also arguable that a partition is a partial sort, and so strictly against the rules of the problem as stated; but I think that would be splitting hairs.
Winston Gutkowski wrote:
Mansukhdeep Thind wrote:What do you mean when you say "make ... vulnerable to a DOS attack"?
I think he's extrapolating a bit; but essentially: if you know the way that the pivot is chosen in advance, you can design a "nasty dataset" that causes Quicksort to behave in the worst way possible (O(n^2)), whereas if you choose the pivot at random, it's virtually impossible to design a dataset that intentionally causes bad performance; and choosing at random has very little effect on the overall performance of the algorithm itself.
Skiplists use randomization of pointer stack heights as a matter of course, which is why they are relatively immune to this sort of attack.
Winston Gutkowski wrote:However, don't ask me about the Maths, because most of it goes over my head.
Winston
~ Mansukh
~ Mansukh
Mansukhdeep Thind wrote:Is your method same as Larry's?
What do you mean by Skiplists? Are they some sort of data structures like ArrayList, LinkedList etc?
And what is meant by "randomization of pointer stack heights"?
I also need to get a grip on all these Big O notation terminologies that you guys have so nonchalantly been talking about since the beginning of this problem.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
OKWinston Gutkowski wrote:
Mansukhdeep Thind wrote:Is your method same as Larry's?
Pretty much. Larry just explained it better with his card analogy: Keep a fixed-length priority queue of the k highest items so far and keep pushing in values that are higher than the smallest item in the queue and drop off the smallest.
Winston Gutkowski wrote:
Mansukhdeep Thind wrote:What do you mean by Skiplists? Are they some sort of data structures like ArrayList, LinkedList etc?
Yup, and in the words of the French knight in Monty Python and the Holy Grail they're "ver' nice". One of the simplest log(n) structures I know of to implement (way simpler than a Tree; especially if it needs to be balanced).
Basically, it's a stack of linked lists built on top of a basic linked list that provides "fast track" access to sections. For more information, you should really read this page or William Pugh's original paper (very straightforward, even for a non-academic like myself).
Unfortunately, there's very little support for them in the Java Collections API, except as "concurrent-use" datasets (one of their other strengths), so you'll either have to find an existing library implementation or roll your own.
Winston Gutkowski wrote:
Mansukhdeep Thind wrote:And what is meant by "randomization of pointer stack heights"?
It's one of the basic premises that underpins the structure, ie, "fast paths" are generated randomly. Like Quick sort, this makes it far less vulnerable to sets of data that can cause the data set to perform poorly, while having virtually no impact on performance.
Winston Gutkowski wrote:
Mansukhdeep Thind wrote:I also need to get a grip on all these Big O notation terminologies that you guys have so nonchalantly been talking about since the beginning of this problem.
My advice: Don't sweat it too much. As you may have already gathered, Mike and I don't even agree about the basics of O(n*k), and we're both pretty experienced. The main thing is that we understand the difference between O(1) (constant time), O(n) (time proportional to the size of a set), O(log(n)) (time proportional to the log of the size of a set (usually base-2)), and "quadratic time" (O(n^2); and usually best avoided unless there's no other way ).
As Jayesh said, it's simply a guide as to how "scalable" an algorithm is, and there are many examples of O(log(n)) algorithms that are actually slower, in practise, than O(n) ones. As as an example: doing a sequential search of ≈20 items or less is likely to be as quick or quicker quicker than a binary search.
HIH
Winston
~ Mansukh
~ Mansukh
Mansukhdeep Thind wrote:Of course, I would be implementing Larry's methodology only. Is there some other way to implement it?
~ Mansukh
Mike Simmons wrote:
Mansukhdeep Thind wrote:Of course, I would be implementing Larry's methodology only. Is there some other way to implement it?
Sure, there's the way you were talking about earlier. Apparently you were only thinking of how to use it to get the greatest element, which was a problem trivially easy to solve with other methods. But I (and Winston) already gave hints how your method could be extended to get the k th element, with better performance if k can be randomly anywhere in the range (not just near the ends). You can get more info under Quickselect if you're interested.
~ Mansukh
Now this is the point at which I am trying to figure out what to do next. How to isolate the top K elements without a sort?
Mansukhdeep Thind wrote:I am stuck guys. Finding the largest element was easy. But implementing the Kth largest is thawing my brain. Let's say for example I have the following unordered set of numbers:As per Larry's methodology, I pick the first 3 elements:and we have...
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Winston Gutkowski wrote: Now: is 10 larger than the smallest of your 3 elements (6)?
~ Mansukh
Mansukhdeep Thind wrote:But Winston, I do not know which is the smallest element in my topKList. These are random numbers. The computer can't see. And we are not allowed to sort. So, how do you know which is the smallest in my assumed topKlist? Do you mean to say that I have to repetitively apply the partition select algorithm to find the smallest in the topKList after each swap until the elements in the remainingList are exhausted?
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here