Justin Chu

Ranch Hand

Posts: 209

1

posted 9 years ago

How to implement an algorithm that averages an array of N doubles, and is relatively free from bug? It should take care of overflows, and be reasonably fast.

Simply accumulating and then divide is susceptible to overflow while dividing all values by N prior to summing them takes 2x the required operations to calculate.

[ October 22, 2007: Message edited by: Chu Tan ]

Simply accumulating and then divide is susceptible to overflow while dividing all values by N prior to summing them takes 2x the required operations to calculate.

[ October 22, 2007: Message edited by: Chu Tan ]

Jim Yingst

Wanderer

Sheriff

Sheriff

Posts: 18671

posted 9 years ago

With doubles? Do you realize how big the numbers have to be to get numeric overflow with doubles? I have a hard time imagining a problem where it's worth worrying about. If you had an array of ints or longs, that's one thing - those might actually overflow. But doubles really won't, not in any remotely real-world kind of problem.

I suppose if you really really need to handle this possibility, you could first do the calculation the fast way, and check if the result is Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY. If not, no overflow has occurred, and you're done. If so, then you could repeat the calculation in a slower mode. Perhaps you could divide the array into 10 (roughly) equal parts. Take the sum of each part separately, then divide each by ten, and add the results. So if your array has 1000 elements, you'll split it into ten 100-element arrays, take the sum of each, divide each by 10, and add the results. That way you only divide by ten ten times. Check for overflow (infinity) with each sum, and if it occurs, repeat the sum by subdividing the array into smaller subdivisions, e.g. one hundred 10-element arrays (dividing each sum by 100).

In this way, you only do a large number of divisions in the extremely rare circumstance of near-overflow. Which will never happen anyway in real life - but if you force it to happen as a test condition, the calculation can still proceed. It's just slow.

But really, I'd just ignore this possibility for doubles, and for floats. For an array of ints, I'd take the sum using a long. And for an array of longs, I'd probably just take the sum using a double. Overflow would be impossible in those situations.

I suppose if you really really need to handle this possibility, you could first do the calculation the fast way, and check if the result is Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY. If not, no overflow has occurred, and you're done. If so, then you could repeat the calculation in a slower mode. Perhaps you could divide the array into 10 (roughly) equal parts. Take the sum of each part separately, then divide each by ten, and add the results. So if your array has 1000 elements, you'll split it into ten 100-element arrays, take the sum of each, divide each by 10, and add the results. That way you only divide by ten ten times. Check for overflow (infinity) with each sum, and if it occurs, repeat the sum by subdividing the array into smaller subdivisions, e.g. one hundred 10-element arrays (dividing each sum by 100).

In this way, you only do a large number of divisions in the extremely rare circumstance of near-overflow. Which will never happen anyway in real life - but if you force it to happen as a test condition, the calculation can still proceed. It's just slow.

But really, I'd just ignore this possibility for doubles, and for floats. For an array of ints, I'd take the sum using a long. And for an array of longs, I'd probably just take the sum using a double. Overflow would be impossible in those situations.

"I'm not back." - Bill Harding, *Twister*

Justin Chu

Ranch Hand

Posts: 209

1

posted 9 years ago

Thanks, I figure that it wouldn't be an issue 99% of the time for finding mean.

How about calculating variance? It requires sum of squares that can overflow if trying to find the variance of 1000000 data points?

I've found wikipedia's http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance . Are there any open source packages that will solve such problem?

How about calculating variance? It requires sum of squares that can overflow if trying to find the variance of 1000000 data points?

I've found wikipedia's http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance . Are there any open source packages that will solve such problem?

bart zagers

Ranch Hand

Posts: 234

posted 9 years ago

There is the Apache Commons Math project, which has this type of calculations. You could take a peek at how they implement things.