I have a problem with a large ResultSet that needs processing before writing the data back into the database.
I'm working with an oracle 11 database and need to process 53Million entries. This can take some time and my usual approach is to read the data into a Vector of records and then process them but this time I unfortunately run out of memory before all records are read. Increasing -Xmx1500m makes the out of memory message appear somewhat later. I haven't tried increasing memory because I read that it won't make a difference over 1.5Gb. is that true?
If I process the records as soon as they arrive I run the risk of having my connection interrupted (happened before).
I'm thinking about maybe writing the records to a local file and processing them from that file but that would defeat the purpose of using a database.
Also wondering if I could split the records into groups of 1 million...
If you do need Java, you could use batching and process in smaller groups as you noted.
I have not considered a stored procedure because I'm more comfortable with java. There are a few variables and "buckets" or HashMap that I would like to use and I don't know what stored procedures can offer me but i will look into it.
I might write it in java first and see how far I come converting it to a stored procedure.
Have you thought about using cursors? This would eliminate the problems of running out of memory after you send your query, it returns your results one at a time allowing you to do your processing for each.
I have to say I have only used them once and it was in a hibernate app, but we did use it to get round a similar issue. Google for java oracle cursor and there were a few examples.
Do you really have to fetch so many records from the database? You can help refine the data by adding where clause..
If yes, why don't you fetch some records and process them, and then may be fetch the next lot of records..
I am sure this will look stable, ofcourse, you are not going to display whole result of the records at once on the screen.
I was not expecting this much help and I am grateful for all your support.
I decided to go with Martin Vajsar's reply, suggesting writing a stored procedure in java and uploading it to the Database. This seems like the perfect solution for me if it works. I worked through the example in the link which worked perfectly (I had to set "javac -target 1.5 Oscar.java" because thats the jvm version in Oracle). I also like the Java Oracle Pointer idea and if that doesn't work, I will go through the rest of the suggestions. Thank you again for all the help, I am confident that I can complete my assignment faster now, using every one of your suggestions.
That would mean you will be using the resources of the database. So, you would eventually get the out of memory error again(probably a bit late as the db will have more memory to spare).
It would be better if you can use cursors in oracle procedures. That would solve your issues.
I understood that Mr. Wolf needed to load all records to memory because processing them takes time in which the SQL query timed out. Moving the processing to database might help to avoid that timeout. I assume that in this case the records will be processed "on the fly" without reading them all to memory. In the opposite case (reading all records to memory) your note applies. I hinted to use Java stored procedures as there were other suggestions that stored procedure approach could work.
I however do not understand the suggestion to use cursors. Unless I'm grossly mistaken, everything that returns rows in Oracle is a cursor. Opening an SQL query with prepared statement creates a cursor in Oracle database that is no different from, say, a ref cursor returned from a procedure. So if there are problems with cursors returned from SQL query described in the initial post, I assume the same problems would appear with any other method that creates the same or similar cursor. I'm not that familiar with JDBC though, and if (and only if) these things are processed differently by JDBC, then these suggestions might work.
I think the inital problem was this.
I'm working with an oracle 11 database and need to process 53Million entries. This can take some time and my usual approach is to read the data into a Vector of records and then process them but this time I unfortunately run out of memory before all records are read.
So, the data is being loaded into a Vector and then processed. So, if the concept is to use a java stored procedure in the database, that might mean (As I understand it. Mr. Wolf should correct me if I am wrong) putting all this code in a class and placing it in the database. So instead of performing this operation in the pc/server it is done in the db. So, the issue will still be there, as processing 53 mil records in a vector will always eat a lot of memory.
Regarding the remaining.
you are right
Unless I'm grossly mistaken, everything that returns rows in Oracle is a cursor.
Opening an SQL query with prepared statement creates a cursor in Oracle database that is no different from, say, a ref cursor returned from a procedure.
As I understand it, the issue was not with the query but with the processing. Even if the data is used directly (not stored in a vector), the processing speed for a cursor in the DB procedure will be far more superior to a java program executing elsewhere. I am saying this out of personal experience, as I had to remove my java code and use procedures for processing of large volumes of records(of course I used java to call that procedure ). Db stored procedures will almost always have a better performance when compared to implementing the same logic java through jdbc. You are esentially eliminating a large number of IO calls and initiating just 1 IO call to start processing entirely in the DB.
So if there are problems with cursors returned from SQL query described in the initial post, I assume the same problems would appear with any other method that creates the same or similar cursor.
Hope it clarifies.