This week's book giveaway is in the Kotlin forum.
We're giving away four copies of Kotlin in Action and have Dmitry Jemerov & Svetlana Isakova on-line!
See this thread for details.
Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Average algorithm  RSS feed

 
Justin Chu
Ranch Hand
Posts: 209
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How to implement an algorithm that averages an array of N doubles, and is relatively free from bug? It should take care of overflows, and be reasonably fast.

Simply accumulating and then divide is susceptible to overflow while dividing all values by N prior to summing them takes 2x the required operations to calculate.
[ October 22, 2007: Message edited by: Chu Tan ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
With doubles? Do you realize how big the numbers have to be to get numeric overflow with doubles? I have a hard time imagining a problem where it's worth worrying about. If you had an array of ints or longs, that's one thing - those might actually overflow. But doubles really won't, not in any remotely real-world kind of problem.

I suppose if you really really need to handle this possibility, you could first do the calculation the fast way, and check if the result is Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY. If not, no overflow has occurred, and you're done. If so, then you could repeat the calculation in a slower mode. Perhaps you could divide the array into 10 (roughly) equal parts. Take the sum of each part separately, then divide each by ten, and add the results. So if your array has 1000 elements, you'll split it into ten 100-element arrays, take the sum of each, divide each by 10, and add the results. That way you only divide by ten ten times. Check for overflow (infinity) with each sum, and if it occurs, repeat the sum by subdividing the array into smaller subdivisions, e.g. one hundred 10-element arrays (dividing each sum by 100).

In this way, you only do a large number of divisions in the extremely rare circumstance of near-overflow. Which will never happen anyway in real life - but if you force it to happen as a test condition, the calculation can still proceed. It's just slow.

But really, I'd just ignore this possibility for doubles, and for floats. For an array of ints, I'd take the sum using a long. And for an array of longs, I'd probably just take the sum using a double. Overflow would be impossible in those situations.
 
Justin Chu
Ranch Hand
Posts: 209
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks, I figure that it wouldn't be an issue 99% of the time for finding mean.

How about calculating variance? It requires sum of squares that can overflow if trying to find the variance of 1000000 data points?

I've found wikipedia's http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance . Are there any open source packages that will solve such problem?
 
bart zagers
Ranch Hand
Posts: 234
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is the Apache Commons Math project, which has this type of calculations. You could take a peek at how they implement things.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!