victor kamat

Ranch Hand

Posts: 247

posted 8 years ago

At a recent interview I was put this question:

You have an int[] array of size n.

You want to get the maximum from the array.

You can use either algorithm A or algorithm B.

algorithm A: best case O(n), worst case O(n^2), memory usage O(1);

algorithm B: best & worst case O(nlog(n)), memory usage O(n).

Question 1) If n <= 1000, would you use A or B

2) If n > 1000 would you use A or B.

It seems to me that if memory is plentiful, then use B for 1) and 2).

If memory is a constraint then for 1) use B and for 2) use A.

I'd appreciate your thoughts on this.

You have an int[] array of size n.

You want to get the maximum from the array.

You can use either algorithm A or algorithm B.

algorithm A: best case O(n), worst case O(n^2), memory usage O(1);

algorithm B: best & worst case O(nlog(n)), memory usage O(n).

Question 1) If n <= 1000, would you use A or B

2) If n > 1000 would you use A or B.

It seems to me that if memory is plentiful, then use B for 1) and 2).

If memory is a constraint then for 1) use B and for 2) use A.

I'd appreciate your thoughts on this.

Ilja Preuss

author

Sheriff

Sheriff

Posts: 14112

posted 8 years ago

Well, I'm not sure it's that easy. This being a totally contrived example, it could still be that O(n^2) is faster than O(n log n) for, say, all n < 10000. So one thing I'd like to know is whether we know about an upper bound for the n > 1000 case. If we know one, it would be interesting to run both algorithms against that upper bound and see which one is faster. To be really sure, we also needed to know *typical* values for n.

Same goes for memory usage. O(1) could easily mean "always needs one terra byte", whereas O(n) could mean "needs n bits". And how much memory are we allowed to consume at a maximum? I mean, sure, memory is plentiful - unless you are developing for some kind of embedded device or something.

With other words, the problem is ways underconstrained to give a simple answer. (It's quite possible that the interviewer was looking for someone who noticed that.)

Same goes for memory usage. O(1) could easily mean "always needs one terra byte", whereas O(n) could mean "needs n bits". And how much memory are we allowed to consume at a maximum? I mean, sure, memory is plentiful - unless you are developing for some kind of embedded device or something.

With other words, the problem is ways underconstrained to give a simple answer. (It's quite possible that the interviewer was looking for someone who noticed that.)

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

posted 8 years ago

Yes, since O() doesn't talk about the constants. And if the constants look like

1000000 * N ln N versus 2 * N ^2

then clearly the N^2 is faster for small values of N

Originally posted by Ilja Preuss:

This being a totally contrived example, it could still be that O(n^2) is faster than O(n log n) for, say, all n < 10000.

Yes, since O() doesn't talk about the constants. And if the constants look like

1000000 * N ln N versus 2 * N ^2

then clearly the N^2 is faster for small values of N

posted 8 years ago

Actually, the FIRST thing I'd do, is see if there weren't other, more suitable algorithms or variants of these two that made them more suitable, based on the actual data that would be processed.

One of my favorite examples was a very practical real-life situation. We had a list averaging a thousand objects more or less that was at the heart of a critical systems application that ran literally hundreds of times a day. This list needed to be sorted. QuickSort and HeapSort would have operated in worst-case mode, since the input data was ALMOST in order, except for some glaring exceptions. But those glaring exceptions also made it worst-case for bubble sort. The optimal solution for the problem was a Shell-Metzner sort, which quickly moved the offending items near their proper locations and then bubbled them into place.

Bottom line: Algorithms only exist in isolation when under academic study. The "best" algorithm depends on both the data it's likely to have to process and on the resources available. If I have an algorithm that consumes 100,000 slots in memory, but the memory is virtual memory that's going to be paged in and out, it's not really as fast as it looks.

I once heard someone complain that OS/2 was slower than his Commodore 64, but that's because he'd set up a program in BASIC using a very large sparse matrix accessed randomly and caused the virtual memory manager to thrash.

I also had a case where storage was a potential limiting factor, doing a Black-Scholes system where I had to build a pyramid of projected interest rates going out 40 years. The original author had committed an atrocity - he'd built a square based on the product of the maximum width x 40 years (in months, no less). A more intelligent approach would have been to cut the square in half, since half the slots would have never been used anyway.

I went one step further. Since all I really cared about was end products, I set up a pair of vectors, each being the maximum width of the pyramid and bounced the intermediate calculations back and forth between them. There since one generation directly produced the next, there was no need for any of the other layers of the pyramid to even exist in RAM. So I recycled them.

One of my favorite examples was a very practical real-life situation. We had a list averaging a thousand objects more or less that was at the heart of a critical systems application that ran literally hundreds of times a day. This list needed to be sorted. QuickSort and HeapSort would have operated in worst-case mode, since the input data was ALMOST in order, except for some glaring exceptions. But those glaring exceptions also made it worst-case for bubble sort. The optimal solution for the problem was a Shell-Metzner sort, which quickly moved the offending items near their proper locations and then bubbled them into place.

Bottom line: Algorithms only exist in isolation when under academic study. The "best" algorithm depends on both the data it's likely to have to process and on the resources available. If I have an algorithm that consumes 100,000 slots in memory, but the memory is virtual memory that's going to be paged in and out, it's not really as fast as it looks.

I once heard someone complain that OS/2 was slower than his Commodore 64, but that's because he'd set up a program in BASIC using a very large sparse matrix accessed randomly and caused the virtual memory manager to thrash.

I also had a case where storage was a potential limiting factor, doing a Black-Scholes system where I had to build a pyramid of projected interest rates going out 40 years. The original author had committed an atrocity - he'd built a square based on the product of the maximum width x 40 years (in months, no less). A more intelligent approach would have been to cut the square in half, since half the slots would have never been used anyway.

I went one step further. Since all I really cared about was end products, I set up a pair of vectors, each being the maximum width of the pyramid and bounced the intermediate calculations back and forth between them. There since one generation directly produced the next, there was no need for any of the other layers of the pyramid to even exist in RAM. So I recycled them.

An IDE is no substitute for an Intelligent Developer.

With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime. |