• Post Reply Bookmark Topic Watch Topic
  • New Topic

Algorithms and Performance

 
victor kamat
Ranch Hand
Posts: 247
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
At a recent interview I was put this question:

You have an int[] array of size n.
You want to get the maximum from the array.
You can use either algorithm A or algorithm B.

algorithm A: best case O(n), worst case O(n^2), memory usage O(1);
algorithm B: best & worst case O(nlog(n)), memory usage O(n).

Question 1) If n <= 1000, would you use A or B
2) If n > 1000 would you use A or B.

It seems to me that if memory is plentiful, then use B for 1) and 2).
If memory is a constraint then for 1) use B and for 2) use A.

I'd appreciate your thoughts on this.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
O(n^2) gets really bad quickly. with 1000, maybe its OK. With 2000, its bad, so its clear you want the n ln(n) algorithm.

Memory is nearly free these days, unless you have really big problems. say keeping the birthdates of everyone in the US or India.
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24213
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I'm in agreement with Pat.

I think I would have pointed out to the interviewer that I (and most people!) know how to find the maximum of an array in guaranteed O(n) time with O(1) storage, but assuming the constraints are real because you say so...
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I'm not sure it's that easy. This being a totally contrived example, it could still be that O(n^2) is faster than O(n log n) for, say, all n < 10000. So one thing I'd like to know is whether we know about an upper bound for the n > 1000 case. If we know one, it would be interesting to run both algorithms against that upper bound and see which one is faster. To be really sure, we also needed to know *typical* values for n.

Same goes for memory usage. O(1) could easily mean "always needs one terra byte", whereas O(n) could mean "needs n bits". And how much memory are we allowed to consume at a maximum? I mean, sure, memory is plentiful - unless you are developing for some kind of embedded device or something.

With other words, the problem is ways underconstrained to give a simple answer. (It's quite possible that the interviewer was looking for someone who noticed that.)
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ilja Preuss:
This being a totally contrived example, it could still be that O(n^2) is faster than O(n log n) for, say, all n < 10000.


Yes, since O() doesn't talk about the constants. And if the constants look like

1000000 * N ln N versus 2 * N ^2
then clearly the N^2 is faster for small values of N
 
Tim Holloway
Bartender
Posts: 18408
58
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, the FIRST thing I'd do, is see if there weren't other, more suitable algorithms or variants of these two that made them more suitable, based on the actual data that would be processed.

One of my favorite examples was a very practical real-life situation. We had a list averaging a thousand objects more or less that was at the heart of a critical systems application that ran literally hundreds of times a day. This list needed to be sorted. QuickSort and HeapSort would have operated in worst-case mode, since the input data was ALMOST in order, except for some glaring exceptions. But those glaring exceptions also made it worst-case for bubble sort. The optimal solution for the problem was a Shell-Metzner sort, which quickly moved the offending items near their proper locations and then bubbled them into place.

Bottom line: Algorithms only exist in isolation when under academic study. The "best" algorithm depends on both the data it's likely to have to process and on the resources available. If I have an algorithm that consumes 100,000 slots in memory, but the memory is virtual memory that's going to be paged in and out, it's not really as fast as it looks.

I once heard someone complain that OS/2 was slower than his Commodore 64, but that's because he'd set up a program in BASIC using a very large sparse matrix accessed randomly and caused the virtual memory manager to thrash.

I also had a case where storage was a potential limiting factor, doing a Black-Scholes system where I had to build a pyramid of projected interest rates going out 40 years. The original author had committed an atrocity - he'd built a square based on the product of the maximum width x 40 years (in months, no less). A more intelligent approach would have been to cut the square in half, since half the slots would have never been used anyway.

I went one step further. Since all I really cared about was end products, I set up a pair of vectors, each being the maximum width of the pyramid and bounced the intermediate calculations back and forth between them. There since one generation directly produced the next, there was no need for any of the other layers of the pyramid to even exist in RAM. So I recycled them.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!