Win a copy of The Way of the Web Tester: A Beginner's Guide to Automating Tests this week in the Testing forum!

# QuickSort Question

Jose Campana
Ranch Hand
Posts: 339
Hello there fellow programmers !

I'm currently facing the challenge of re-learning some of the fundamental sorting algorithms, perhaps the only one that's very powerful to perform ordering it's QuickSort; and that's the one I'm trying
to implement, given that I was frustrated by failure, I incorporated some code I found in my own code, now I have a Question about it, I want to know how it works in the precise part that controls the recursive part of it....

I've just placed a comment about the 2 ifs that don't make sense to me, Could anybody please help me with an insight about those two conditions, and what's the logic involved in them?

Best Regards,

Jose

fred rosenberger
lowercase baba
Bartender
Posts: 12235
36
First, I would question your statement that a quicksort is "perhaps the only one that's very powerful to perform ordering". If i recall correctly, if the dataset is already mostly sorted, quicksort is painfully slow, whereas a bubble sort would be rather peppy.

Without looking up anything (so i may be wrong on the details), quicksort picks a 'pivot' point in the array. All the other elements are then compared to the pivot. Any element less than the pivot is moved into the partition before it, and any element greater is moved into the partition after. The specific order of elements in each partition doesn't matter.

Once that is done, the 'pivot' is in it's correct spot. you then perform a quicksort on the two particians. You select a pivot in there, and move elements into their partitions, then quicksort each of THOSE...etc. So each time you call the method, you pass in the indicies of the sub-partition you want to sort. Eventually, you will pass in something like (myArray, 17, 17). When this happens, your "left < J" will fail (since you don't need to sort one element), so the recursion stops.

Jose Campana
Ranch Hand
Posts: 339
Hello fred, your explanation seems to be quite reasonable. Thanks for the insight. that's how I needed to see it.
However, now I have one more doubt.

fred rosenberger wrote:
Once that is done, the 'pivot' is in it's correct spot.

According to what I have read about QuickSort, and as you have said, after each iteration, the pivot is supposed to land in the right spot, however that's not entirely right, If you take a look at what really happens... that's NOT true for every iteration. For example....

[101, 0, 5, 90, 1000, 666, 7, 23, 69, 114, 78, 111, 54, 67, 3, 65, 1, 70, 22, ] :pivot = 114
[101, 0, 5, 90, 22, 70, 7, 23, 69, 1, 78, 111, 54, 67, 3, 65, 114, 666, 1000, ] :pivot = 23
[3, 0, 5, 1, 22, 23, 7, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 1
[1, 0, 5, 3, 22, 23, 7, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 1
[0, 1, 5, 3, 22, 23, 7, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 22
[0, 1, 5, 3, 7, 23, 22, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 3
[0, 1, 3, 5, 7, 23, 22, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 5
[0, 1, 3, 5, 7, 23, 22, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 23
[0, 1, 3, 5, 7, 22, 23, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 111
[0, 1, 3, 5, 7, 22, 23, 70, 69, 90, 78, 65, 54, 67, 101, 111, 114, 666, 1000, ] :pivot = 78
[0, 1, 3, 5, 7, 22, 23, 70, 69, 67, 54, 65, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 67
[0, 1, 3, 5, 7, 22, 23, 65, 54, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 65
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 69
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 90
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 666
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ]

In the output above, take a look at the highlighted text, result of iterating using 23 as the pivot. You can notice that 7 comes After 23.
And cases like this happen ALL the time with this algorithm, and it drove me crazy !

Could you please tell me Why this does NOT impact the end result (ordered Array).

Good Luck,

Jose

fred rosenberger
lowercase baba
Bartender
Posts: 12235
36
Short answer: I don't know, and I can't dig into it.

Can you additionally print out your left and right values for each call? Remeber, when you pick 23 as the pivot, you are not sorting the whole array, but some sub-part of it. So when you pick 23 as the pivot, you may not be working with the part of the array that holds the '7'. that's just a guess...

Jose Campana
Ranch Hand
Posts: 339
Hello Fred,

[101, 0, 5, 90, 1000, 666, 7, 23, 69, 114, 78, 111, 54, 67, 3, 65, 1, 70, 22, ] :pivot = 114, left = 101, right = 22
[101, 0, 5, 90, 22, 70, 7, 23, 69, 1, 78, 111, 54, 67, 3, 65, 114, 666, 1000, ] :pivot = 23, left = 101, right = 65
[3, 0, 5, 1, 22, 23, 7, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ]
:pivot = 1, left = 3, right = 7
[1, 0, 5, 3, 22, 23, 7, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 1, left = 1, right = 0
[0, 1, 5, 3, 22, 23, 7, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 22, left = 5, right = 7
[0, 1, 5, 3, 7, 23, 22, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 3, left = 5, right = 7
[0, 1, 3, 5, 7, 23, 22, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 5, left = 5, right = 7
[0, 1, 3, 5, 7, 23, 22, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 23, left = 23, right = 22
[0, 1, 3, 5, 7, 22, 23, 70, 69, 90, 78, 111, 54, 67, 101, 65, 114, 666, 1000, ] :pivot = 111, left = 70, right = 65
[0, 1, 3, 5, 7, 22, 23, 70, 69, 90, 78, 65, 54, 67, 101, 111, 114, 666, 1000, ] :pivot = 78, left = 70, right = 101
[0, 1, 3, 5, 7, 22, 23, 70, 69, 67, 54, 65, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 67, left = 70, right = 65
[0, 1, 3, 5, 7, 22, 23, 65, 54, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 65, left = 65, right = 54
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 69, left = 69, right = 70
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 90, left = 78, right = 101
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ] :pivot = 666, left = 114, right = 1000
[0, 1, 3, 5, 7, 22, 23, 54, 65, 67, 69, 70, 78, 90, 101, 111, 114, 666, 1000, ]

I hope you or somebody else can give me some answers, I was frustrated by confusion yesterday.... I want to get this right.

Thanks again,

Jose

fred rosenberger
lowercase baba
Bartender
Posts: 12235
36
maybe your logic is off a little, and you just got lucky that it sorted right? I don't have the psuedocode your working from, but following what you have written, when we pick 23 as the pivot, we are sorting this:

since 101 is greater than 23, we leave i where it is. since 65 is greater than 23, we move j down to 3. we then swap those elements, increment i and decrement j, giving us this:

i now slides up to 90, and j down to 1. we swap those:

i slides up to 70, j down to 23, and we swap

and we decrement/increment i & j, and we end.

i'm wondering if we need some logic about not letting j get less than the pivot (and i not greater than the pivot)... The algorithm listed on the wikipedia talks about temporarily moving the pivot to the end of the array "so it doesn't get in the way". That may be happening here.

Best I can do right now. I need to go put my daughter to be.

ok...we don't seem to want to preserve my spacing. ignore where my i's and j's show up. It might work if you copy it all to a monospace font.

Ulrika Tingle
Ranch Hand
Posts: 92
fred rosenberger wrote:If i recall correctly, if the dataset is already mostly sorted, quicksort is painfully slow, whereas a bubble sort would be rather peppy.

I don't think you recall correctly. Quicksort is never painfully slow. If you're extremely unlucky (when picking pivot elements) the complexity can degrade to O(N*N) but that's the worst case really. Not even if you're the unluckiest person in the world you should get the worst case more than maybe once in a lifetime.

Apart from the unluck issue, Quicksort performs very well with both sorted, reversely sorted and random sequences. It's hard to beat Quicksort really.

Mike Simmons
Ranch Hand
Posts: 3090
14
Ulrika Tingle wrote:
fred rosenberger wrote:If i recall correctly, if the dataset is already mostly sorted, quicksort is painfully slow, whereas a bubble sort would be rather peppy.

I don't think you recall correctly. Quicksort is never painfully slow. If you're extremely unlucky (when picking pivot elements) the complexity can degrade to O(N*N) but that's the worst case really. Not even if you're the unluckiest person in the world you should get the worst case more than maybe once in a lifetime.

Apart from the unluck issue, Quicksort performs very well with both sorted, reversely sorted and random sequences. It's hard to beat Quicksort really.

Hmm. I would say that O(N*N) is painfully slow, if you know that O(N*log(N)) is achievable by other techniques. For large enough N, of course.

But I think at a deeper level, the issue here is that Fred is talking about a naive, basic quicksort, such as that shown in pseudocode here, and Ulrika is talking about a randomized quicksort, where you either randomize the choice of a pivot point on each iteration, or just randomly shuffle the list before sorting. A naive quicksort is bad if you apply it to a situation where the input is often almost sorted - because that almost-sorted input will often give you O(N*N) performance. Which can be painfully slow. Randomizing adds a little overhead, but it's O(N) overhead, so it's usually ignorable. And it generally protects you from the worst case scenario, so that any input, even frequent almost-sorted input, will only result in O(N*N) on very, very rare occasions. (And the bigger N is, the rarer the worst-case scenario is.) So a randomized quicksort is usually a much better choice. Unless perhaps you are fairly certain that the input is already pretty random, as opposed to almost sorted, and thus randomizing it some more is of no real benefit.

fred rosenberger
lowercase baba
Bartender
Posts: 12235
36
I'm speaking from experience. Years ago, I was once assigned ticket. Originally, we had a small, sorted file. Every month, new data was added (a few hundred lines). The original developer appended the new data to the end, and ran a quicksort. As time passed, the file got bigger and bigger, until it was several million rows. By adding new data to the end and using a quicksort, it was taking HOURS if not DAYS to finish. I would call that painfully slow.

I re-wrote it using a bubble sort, and it started processing in five minutes.

Ulrika Tingle
Ranch Hand
Posts: 92
fred rosenberger wrote:I re-wrote it using a bubble sort, and it started processing in five minutes.

Everybody knows that Bubblesort is extremely fast when it comes to nearly sorted datasets (but only then). If you have that special situation then by all means use Bubblesort.

But there really was no need to replace Quicksort. You could've used Quicksort to sort the added data only. Then you'd had two sorted datasets which could be merged in just one pass to produce a sorted total dataset. Very simple and efficient indeed (both in space and time).

So although Bubblesort improved the situation it wasn't the optimal solution by far. Still Quicksort which wasn't the real problem was made scapegoat. I think this shows that although personal experience is interesting it's wrong to draw far-reaching general conclussions from it. Quicksort still is the best general sorting algorithm (with Mergesort as runner up).

fred rosenberger
lowercase baba
Bartender
Posts: 12235
36
When the data is sorted, quicksort IS slow. period. Your suggestion of using quicksort to sort the new data changes the paradigm. You are saying "don't use it to sort sorted data, but to sort unsorted data, then it's fast!!!". And that's what I'm saying, too. There are times and places where you should NOT use it.

Further, by replacing the quicksort with a bubble sort, I only had to drop in the logic for that code. To do what you were suggesting would require a major re-write of the program's logic.

Mike Simmons
Ranch Hand
Posts: 3090
14
fred rosenberger wrote:ok...we don't seem to want to preserve my spacing. ignore where my i's and j's show up. It might work if you copy it all to a monospace font.

Nope. I think the problem is that the post wasn't edited with a monospace font - that's not what JForum gives us on an edit. If I use the quote function on your post, the edit window I get seems to have everything lined up fine. But that's not in monospace. Then when you put [ code ] tags on, you do get monospace, and things no longer line up.

Mike Simmons
Ranch Hand
Posts: 3090
14
fred rosenberger wrote:When the data is sorted, quicksort IS slow. period.

Well, for a randomized quicksort, applied against the entire list at once, it's the same speed as a quicksort for unsorted data. It just so happens that there are faster solutions for mostly-sorted data (especially if the unsorted part is at a known location). If you're not using a randomized quicksort, then yes, it can be notably slower on mostly-sorted data than it is for unsorted data.

Ulrika Tingle
Ranch Hand
Posts: 92
fred rosenberger wrote:When the data is sorted, quicksort IS slow. period.

Well, if you're that sure you certainly can point to a reference in support of your claim, can't you? I doubt it though. I'd say that exactly sorted (or inversely sorted) datasets are optimal for Quicksort. It's because then the recursive divisions tend to split in nearly half which is optimal.

There are times and places where you should NOT use it.

Only one time and place.

Quicksort is the best general array sorting algorithm but there's one corner-case where you may consider a Bubblesort instead, namely when the dataset is nearly ordered. And even in that case you may be better off by instead changing "the paradigm" a little, as your example so nicely demonstrates.

Regardless of how the sorting took place before, with my suggestion it effectively would be reduced to a mere copying of the dataset file. It would've been a clean efficient solution without the Bubblesort kludge waiting to blow up on you.

Campbell Ritchie
Sheriff
Posts: 50768
83
Ulrika Tingle wrote: . . . Quicksort is the best general array sorting algorithm . . .
What about when you want two elements with the same value left unchanged. I thought quicksort might exchange them, whereas merge sort leaves them unchanged? That is why the )]Arrays#sort() and similar methods use merge sort. They describe it as stable, meaning you can sort once then sort again on different criteria and get a combined sort. Like sorting on first name then last name, which would give Tingle Ulrich, Tingle Ulrika, Tingle Ulrike and Tingle Vera in order.

Ulrika Tingle
Ranch Hand
Posts: 92
Campbell Ritchie wrote:and similar methods use merge sort.

I mentioned Mergesort as runner up to Quicksort in previous posts but I found it very tedious to keep reapeating it. Mergesort is slightly slower than Quicksort but is stable as you mention which may make up for it.

Ulrika Tingle
Ranch Hand
Posts: 92
fred rosenberger wrote: By adding new data to the end and using a quicksort, it was taking HOURS if not DAYS to finish. I would call that painfully slow.

I re-wrote it using a bubble sort, and it started processing in five minutes.

This huge performance difference really isn't motivated by the algorithms alone. So I'm suspecting that the sorting took place on file. The dataset wasn't first loaded into an array right? I assume it wasn't for the sake of reasoning.

Although you can consider a random access file to be logically equivalent to an array it has physical properties that makes it behave more like a dequeue (piecewise random). That definately isn't a situation Quicksort handles well.

Bubblesort works by repeatedly making sequential passes through the array and this makes a fairly good list sorting algorithm. If the sorting took place on file this property must have strongly favoured Bubblesort over Quicksort. So it's not only the nearly ordered dataset that favours Bubblesort. It's also the fact that Bubblesort share some properties with a file sorting algorithm.

If the sorting took place on file, a standard filesorting algorithm probably would've worked as well as the Bubblesort you developed for the purpose.

Jose Campana
Ranch Hand
Posts: 339
Given the complexity of this algorithm, I was wondering If it was adequate for it to be asked frequently as an interview question ?
I was recently told a guy failed to write down a pseudo-code about it on a blackboard during an interview.
I don't know exactly what to comment on this, but I certainly believe it's not something you happen to learn overnight or otherwise something that happens to be fundamental for your programming survival; given that you can find it solved all over the internet.
Now, that I have seen the length of our discussion about it I rest re-assured, that's probably an advanced topic.

What do you guys think ?

Mike Simmons
Ranch Hand
Posts: 3090
14
Jose: In general I don't think it works very well as a standard interview question that most people should be expected to know. However I can think of a few cases where it might make more sense:

1. The job may be unusually algorithm-intensive, and they really need people who know algorithms well.

2. The candidate may list algorithms on their resume as an area they're particularly strong at. Many interviewers will simply ask about whatever they see on the resume.

3. For a candidate who has just completed a computer science degree but has little or no work experience, questions related to coursework may be more appropriate, simply because there's a shortage of good work-related things to ask about.

Even in these cases, it's probably not the sort of question that a candidate should immediately fail for not knowing the answer. But it's fair for the interviewer to ask a bit about such topics, and see how the candidate reacts.

Ulrika Tingle
Ranch Hand
Posts: 92
Jose Campana wrote:Given the complexity of this algorithm, I was wondering If it was adequate for it to be asked frequently as an interview question ?
I was recently told a guy failed to write down a pseudo-code about it on a blackboard during an interview.
I don't know exactly what to comment on this, but I certainly believe it's not something you happen to learn overnight or otherwise something that happens to be fundamental for your programming survival; given that you can find it solved all over the internet.
Now, that I have seen the length of our discussion about it I rest re-assured, that's probably an advanced topic.

What do you guys think ?

Well, Quicksort is one of the most well-known algorithms in computer science. It's the fasterst sorting algorithm known to man. It's the quintessential divide-and-conquer recursive algoritm. I see no reason really why not every programmer should know how it works.

And it's very easy in principle. Especially the recursive version:

Step 1. Split the array at a pivot position and form three parts: a left partition with all elements that are smaller or equal to the pivot element, the pivot element itself, and a right partition with the rest of the elements.

Step 2. Then build the array together again from the three parts, but first apply step 1 on the left and right partitions unless the partition holds just one element or is empty.

So it's just to split the array into three parts and then build it together again recursively. No big deal really.

I've turned it into a very naive Java implementation. There are no optimization attempts whatsoever. It's not even an in-place sort. Instead new arrays are created to hold the partitions,

I've run some tests on naiveQS and compared it with the built-in Java sort (which uses Mergesort).

For 1 million randomly ordered integers naiveQS takes abot 2 seconds on my computer. The Java sort takes about 0.6 seconds. So naiveQS is just a little more than 3 times slower. This is an amazing result really considering how naively and wastefully implemented naiveQS is!

I also tested naiveQS with an ordered and a reversely ordered array. The sort then took about 1 second in both cases, so it was twice as fast as the random order.

I've noticed that the performace of naiveQS degrades a lot when there are many doubles in the array. The reason is that the left partition may end up with lots of elements equal to the pivot element while the right partition becomes almost empty. This leads to an unbalanced recursion which may even end in a stack overflow. This is a situation a more professional implementation must handle. naiveQS only demonstrates the basic Quicksort principle. It should of course not be used in production code.

Ulrika Tingle
Ranch Hand
Posts: 92
This is the test program,

Mike Simmons
Ranch Hand
Posts: 3090
14
Ulrika Tingle wrote:Well, Quicksort is one of the most well-known algorithms in computer science. It's the fasterst sorting algorithm known to man. It's the quintessential divide-and-conquer recursive algoritm. I see no reason really why not every programmer should know how it works.

I agree it's a great algorithm. But among working programmers with 5, 10 years or more of experience after graduating - it's extremely possible that there has been no practical need to actually code one's own Quicksort during all that time. Thus, I wouldn't really penalize an applicant who didn't remember enough about it to code one up on the spot in an interview. Of course I would want them to be able to discuss some other topics at length, based on their more recent experience.

Another thing to realize is that many working programmers don't actually come from a CS background, so it's possible that they never actually studied quicksort. Personally I studied Engineering & Physics, and picked up programming initially mostly for numerical methods. Quicksort was never covered in any course I took, though I do remember hearing the name. I chose to learn about it later, on my own, when computers became my career - because I think algorithms are cool. But frankly there was never any compelling need to know it, based on any job I worked at. It never came up. Library routines like Collections.sort() and TreeSet were more than good enough, the vast majority of the time. So while I do think it's cool to know about algorithms, many people I've worked with don't seem to find it that important, and I can't really blame them, based on my own experience.

Ulrika Tingle
Ranch Hand
Posts: 92
Mike Simmons wrote:I agree it's a great algorithm. But among working programmers with 5, 10 years or more of experience after graduating - it's extremely possible that there has been no practical need to actually code one's own Quicksort during all that time. Thus, I wouldn't really penalize an applicant who didn't remember enough about it to code one up on the spot in an interview. Of course I would want them to be able to discuss some other topics at length, based on their more recent experience.

There are lots of things you know that you don't make use of daily, but that doesn't mean this knowledge is unimportant. Most programmers never again code any of the basic algorithms & data structures they studied in school. There's almost always some handy library available. But this doesn't mean in-depth knowledge is unimportant or wasted. In fact it's very important because otherwise you risk misusing powerful libraries to disastreous effect.

When you hire a Java programmer one thing you want to know (apart from coding and social skills) is the extent of this person's knowledge base. To probe this you don't ask about daily stuff, like how to use the Java collections or something. Your question must concern something more deep and fundamental and why not Quicksort really?

First, Quicksort is simple (in principle) so it's not like you need to be a genius to account for it. What it requires is familiarity with recursion and the divide-and-conquer approach. Second, it's a well-known almost mythical algorithm. If you're just a little curious you most certainly would want to know how it works.

I guess the company must've considered knowledge of Quicksort a good litmus test to help identify the programmer they were looking for. But it would be a little hard to flunk an applicant on this question alone. If that's what they did. There could be other reasons why this particular applicant wasn't considered the right person for the job.