I am writing a program that needs to do a relatively large amount of tough calculations on a relatively large amount of numbers, which are doubles. While developing I can store all of these in an array just fine, because the test data sets I use are very small. However, to put the program to some real tests I need to store about 25 million such doubles (approx. 10 minutes of sound samples). When I try to make an array this large I get this:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Now, do I have to live with this and make some kind of wrapper class "glueing" many arrays together? Or is there some other structure out there capable of this?
Well, the easiest thing to try is increasing your JVM heap size. Try running you program with something like
java -Xmx300M MyClass
which sets your heap size to 300 Mb. (20 million doubles should take about 200 Mb; I don't know how much extra space you'll need.) If that's more memory than you machine has (or if you anticipate needing to deal with even larger sound files) then you'll have to try something else which allows you to keep most of the data somewhere other than in memory. A file seems like the most likely choice. You could use streams to read from one file, and optionally writ out a new file (assuming you're changing the data in any way that needs to be saved):
This works pretty well as long as you only need one number at a time, in order. Otherwise you may need to do something more complicated.
Alternately, you may find that the java.nio classes will work better for you here - particularly FileChannel's map() method:
You may not be able to map() the entire file at once, in which case you may have to map as much as you can at once, process it, close the FileChannel and open a new one, and then map the next section of the file. I'm not sure if all that is necessary, but it may be - depending on how your OS and hardware work. Using NIO and memory mapping can be a bit more complex to figure out, but for large files it can be more efficient. And depending on what sort of processing you're doing, it may be helpful to have a large chunk of data in memory at one time.
The easy solution would be to increase the heap like jim sez.
what about divide and conquer? Implement a size check on ur array when u do the calculation, as you pop into the first array once it reaches certain size u make a second array, and put ur result in there. Repeat this step until all calculation is done.
You can also create an object to store the result ('wraps arround' the array or collection, and call it CalculationResultCollection or something). When one CalculationResultCollection is full, serialize it or store it somehow (like jim suggest, in a file?), thus de-allocating memory needed for the next group of calculation result
Once u have all these different separated collections u can then store them separately, or process them separately.
As Zenikko suggested incorporate both approaches together.
-- Divide your 25 million into say 25 chunks of 1 million each. Store 1 million at a time in the memory and once finished write to a file or a database temp table. Temp tables can be cleared out once finished.
-- If you need a quick solution, increase the JVM's heap and if required to the stack size as well (i.e if you get a StackOverFlowException).