posted 10 years ago
How to implement an algorithm that averages an array of N doubles, and is relatively free from bug? It should take care of overflows, and be reasonably fast.
Simply accumulating and then divide is susceptible to overflow while dividing all values by N prior to summing them takes 2x the required operations to calculate.
[ October 22, 2007: Message edited by: Chu Tan ]
Simply accumulating and then divide is susceptible to overflow while dividing all values by N prior to summing them takes 2x the required operations to calculate.
[ October 22, 2007: Message edited by: Chu Tan ]
posted 10 years ago
With doubles? Do you realize how big the numbers have to be to get numeric overflow with doubles? I have a hard time imagining a problem where it's worth worrying about. If you had an array of ints or longs, that's one thing  those might actually overflow. But doubles really won't, not in any remotely realworld kind of problem.
I suppose if you really really need to handle this possibility, you could first do the calculation the fast way, and check if the result is Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY. If not, no overflow has occurred, and you're done. If so, then you could repeat the calculation in a slower mode. Perhaps you could divide the array into 10 (roughly) equal parts. Take the sum of each part separately, then divide each by ten, and add the results. So if your array has 1000 elements, you'll split it into ten 100element arrays, take the sum of each, divide each by 10, and add the results. That way you only divide by ten ten times. Check for overflow (infinity) with each sum, and if it occurs, repeat the sum by subdividing the array into smaller subdivisions, e.g. one hundred 10element arrays (dividing each sum by 100).
In this way, you only do a large number of divisions in the extremely rare circumstance of nearoverflow. Which will never happen anyway in real life  but if you force it to happen as a test condition, the calculation can still proceed. It's just slow.
But really, I'd just ignore this possibility for doubles, and for floats. For an array of ints, I'd take the sum using a long. And for an array of longs, I'd probably just take the sum using a double. Overflow would be impossible in those situations.
I suppose if you really really need to handle this possibility, you could first do the calculation the fast way, and check if the result is Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY. If not, no overflow has occurred, and you're done. If so, then you could repeat the calculation in a slower mode. Perhaps you could divide the array into 10 (roughly) equal parts. Take the sum of each part separately, then divide each by ten, and add the results. So if your array has 1000 elements, you'll split it into ten 100element arrays, take the sum of each, divide each by 10, and add the results. That way you only divide by ten ten times. Check for overflow (infinity) with each sum, and if it occurs, repeat the sum by subdividing the array into smaller subdivisions, e.g. one hundred 10element arrays (dividing each sum by 100).
In this way, you only do a large number of divisions in the extremely rare circumstance of nearoverflow. Which will never happen anyway in real life  but if you force it to happen as a test condition, the calculation can still proceed. It's just slow.
But really, I'd just ignore this possibility for doubles, and for floats. For an array of ints, I'd take the sum using a long. And for an array of longs, I'd probably just take the sum using a double. Overflow would be impossible in those situations.
"I'm not back."  Bill Harding, Twister
Justin Chu
Ranch Hand
Posts: 209
1
posted 10 years ago
Thanks, I figure that it wouldn't be an issue 99% of the time for finding mean.
How about calculating variance? It requires sum of squares that can overflow if trying to find the variance of 1000000 data points?
I've found wikipedia's http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance . Are there any open source packages that will solve such problem?
How about calculating variance? It requires sum of squares that can overflow if trying to find the variance of 1000000 data points?
I've found wikipedia's http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance . Are there any open source packages that will solve such problem?
posted 10 years ago
There is the Apache Commons Math project, which has this type of calculations. You could take a peek at how they implement things.
Bras cause cancer. And tiny ads:
The WEB SERVICES and JAXRS Course
https://coderanch.com/t/690789/WEBSERVICESJAXRS
