Win a copy of Penetration Testing Basics this week in the Security forum!

# Algorithms and Performance

victor kamat
Ranch Hand
Posts: 247
At a recent interview I was put this question:

You have an int[] array of size n.
You want to get the maximum from the array.
You can use either algorithm A or algorithm B.

algorithm A: best case O(n), worst case O(n^2), memory usage O(1);
algorithm B: best & worst case O(nlog(n)), memory usage O(n).

Question 1) If n <= 1000, would you use A or B
2) If n > 1000 would you use A or B.

It seems to me that if memory is plentiful, then use B for 1) and 2).
If memory is a constraint then for 1) use B and for 2) use A.

I'd appreciate your thoughts on this.

Pat Farrell
Rancher
Posts: 4678
7
O(n^2) gets really bad quickly. with 1000, maybe its OK. With 2000, its bad, so its clear you want the n ln(n) algorithm.

Memory is nearly free these days, unless you have really big problems. say keeping the birthdates of everyone in the US or India.

Ernest Friedman-Hill
author and iconoclast
Marshal
Posts: 24212
35
Yes, I'm in agreement with Pat.

I think I would have pointed out to the interviewer that I (and most people!) know how to find the maximum of an array in guaranteed O(n) time with O(1) storage, but assuming the constraints are real because you say so...

Ilja Preuss
author
Sheriff
Posts: 14112
Well, I'm not sure it's that easy. This being a totally contrived example, it could still be that O(n^2) is faster than O(n log n) for, say, all n < 10000. So one thing I'd like to know is whether we know about an upper bound for the n > 1000 case. If we know one, it would be interesting to run both algorithms against that upper bound and see which one is faster. To be really sure, we also needed to know *typical* values for n.

Same goes for memory usage. O(1) could easily mean "always needs one terra byte", whereas O(n) could mean "needs n bits". And how much memory are we allowed to consume at a maximum? I mean, sure, memory is plentiful - unless you are developing for some kind of embedded device or something.

With other words, the problem is ways underconstrained to give a simple answer. (It's quite possible that the interviewer was looking for someone who noticed that.)

Pat Farrell
Rancher
Posts: 4678
7
Originally posted by Ilja Preuss:
This being a totally contrived example, it could still be that O(n^2) is faster than O(n log n) for, say, all n < 10000.

Yes, since O() doesn't talk about the constants. And if the constants look like

1000000 * N ln N versus 2 * N ^2
then clearly the N^2 is faster for small values of N

Tim Holloway
Saloon Keeper
Posts: 18319
56
Actually, the FIRST thing I'd do, is see if there weren't other, more suitable algorithms or variants of these two that made them more suitable, based on the actual data that would be processed.

One of my favorite examples was a very practical real-life situation. We had a list averaging a thousand objects more or less that was at the heart of a critical systems application that ran literally hundreds of times a day. This list needed to be sorted. QuickSort and HeapSort would have operated in worst-case mode, since the input data was ALMOST in order, except for some glaring exceptions. But those glaring exceptions also made it worst-case for bubble sort. The optimal solution for the problem was a Shell-Metzner sort, which quickly moved the offending items near their proper locations and then bubbled them into place.

Bottom line: Algorithms only exist in isolation when under academic study. The "best" algorithm depends on both the data it's likely to have to process and on the resources available. If I have an algorithm that consumes 100,000 slots in memory, but the memory is virtual memory that's going to be paged in and out, it's not really as fast as it looks.

I once heard someone complain that OS/2 was slower than his Commodore 64, but that's because he'd set up a program in BASIC using a very large sparse matrix accessed randomly and caused the virtual memory manager to thrash.

I also had a case where storage was a potential limiting factor, doing a Black-Scholes system where I had to build a pyramid of projected interest rates going out 40 years. The original author had committed an atrocity - he'd built a square based on the product of the maximum width x 40 years (in months, no less). A more intelligent approach would have been to cut the square in half, since half the slots would have never been used anyway.

I went one step further. Since all I really cared about was end products, I set up a pair of vectors, each being the maximum width of the pyramid and bounced the intermediate calculations back and forth between them. There since one generation directly produced the next, there was no need for any of the other layers of the pyramid to even exist in RAM. So I recycled them.