Win a copy of High Performance Python for Data Analytics this week in the Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

hibernate in a batch program suggestion

 
Ranch Hand
Posts: 312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I have to run a batch program which contains 100k records to process.
The java program has to fetch 100k records from the database and process each record. Once the process is complete the java program updates the record in the database with the process status.
We are using Hibernate O/R mapping framework for all database interactions.

I am trying to figure out the best approach to tackle this. First approach is I load all 100k records via Hibernate Criteria list, carry out the java processing and then update individual record in the database with the status. But the issue I encounter is it runs into an Out of Memory space error as each record is of considerable size comprising several data columns.
The other approach I tried is to load only those columns (using Hibernate Projections) for the 100k records required for processing in java. This approach solves the 'out of space memory ' problem I run into as mentioned in the first approach but there is downside to this. Since I am only grabbing the selected column of the record object I end up with partial Hibernate entity object not good enough for persisting back to the database. So after processing each record I have to reload back the full Hibernate entity object in order carry out the update. This works but is significant performance overhead considering there will be 100k database roundtrips.

Please suggest what would be a good approach for handling the aforementioned problem.

Thanks,
M
 
author and cow tipper
Posts: 5000
1
Hibernate Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm not even sure if Hibernate is the right tool here. Typically batch processing can be done more effectively and efficiently by native database tools, ETL programs, and other options. Does it have to be an ORM mapping tool?

-Cameron McKenzie
 
Ranch Hand
Posts: 200
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Do you really need all rows in memory at the same time? Can't you devide the data into some logically connected parts which are independent? How much memory do you have? How many fields does your table have ans how big are they? I think there shouldn't be a problem to hold 100K rows.
If you don't need everything in memory you can use scrollable resultsets. And don't forget to flush() and clear() session from time to time (100 to 1000 rows).
 
manish ahuja
Ranch Hand
Posts: 312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Cameron
The batch functionality is an extension of our current on-demand java application. The java app contains the business processing needed for the mentioned records part of the batch op. So it's inevitable to use/invoke the java app in a batch.

@Christian
No there is no need to have all the 100k records in memory at the same time.
Another approach to get around the memory size issue (approach1) is to divide the entire batch program into smaller chunks say 10k records at a time, load it and process. This works well but we limit the resultset to size of 10k [criteria.maxresults(10000)] in one run and needs executing the batch utility 10 or more times. In this case how can I ensure how to size the loop of 10 or more or less batch routines. Can you elaborate on the Scrollable results set suggestion.



 
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Take a look at this: http://docs.jboss.org/hibernate/stable/core/reference/en/html/batch.html
 
I will suppress my every urge. But not this shameless plug:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic