Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

batch inserting to the database

 
Abhishek Mish
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
My requirements is that i need to read a csv file which may contains huge data ie numbers of rows may be very huge.
After reading the file, I need to parse the file and do the validations based on some configuration which is stored in the database.
After suceesfull validation, I need to insert all the records in the database.
I think instead of commiting per data in a loop in a single transaction which can take long time and can cause the request time out or session time out as this will be synchronous request.
If data are commited in batch this can reduce the time as compared to per data basics. Commiting data in batch can also not confirm the request time out or session time out if size of the file is very huge. Also there is one more problem if few batch data are commited successful and with few batch there is some problem partial data will be commited as this is not done in single transaction.
Kindly suggest what would be the best possible solution to insert or update data in batch to avoid above mentioned problems.
Using ORM- EclipseLink(JPA) and Spring.

Thanks,
Abhishek
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I think instead of commiting per data in a loop in a single transaction which can take long time and can cause the request time out or session time out as this will be synchronous request.

It doesn't make any sense doing a bulk load in a single transaction.


Using ORM- EclipseLink(JPA) and Spring.

My suggestion would be to not use these. Use your database's bulk loading tool, it is usually far better suited to this sort of operation than an ORM.
 
Abhishek Mish
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the response..
I think what i understand is execuitng SQLLDR command from java class which will be faster.
But my concern is after executing sqlldr, I need to get the execution type so that i can show the message in the UI that it is successful or failed.
But still i think using this approach can add to lot of user firendly limitations.

Thanks,
Abhishek
 
Rahul Babbar
Ranch Hand
Posts: 210
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i was just wondering why dont you just read the data from the csv, put a loop, start reading/validating/inserting/updating the records in a transaction and after every 1000th(or any other No) record is processed, commit the transaction and start a new one....if you need to process more records...
it may involve some transaction coding, and you will have full control over what you do...
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Abhishek Mish wrote:Thanks for the response..
I think what i understand is execuitng SQLLDR command from java class which will be faster.
But my concern is after executing sqlldr, I need to get the execution type so that i can show the message in the UI that it is successful or failed.
But still i think using this approach can add to lot of user firendly limitations.

Thanks,
Abhishek


This is a bulk load yes? Its never going to be that user friendly operation; its only ever done be DBAs and simmilar people, because ultimately it may require some DBA style tasks to do it (disabling constraints, using parallel threads, using direct loading etc.). Are you anticipating your bulk load will be performed by people who won't be able to understand SQLLDR's output?

This aside your biggest headache is the requirement to commit only when all the records have been validated. If the data set is of any significant size wrapping the whole thing in a single transaction is a non-starter in any loading tool. And since some of your validation is from data in the database itself the longer the load time the more likely the data will be stale before being commited. I think you may need to apply more control to this by using staging tables; upload the data outside of a transaction to staging tables, validate the data in the database, then perform your insert based on the data in the staging tables. To do this you may have to lock the data you are validating with in some way. Not nice I'll admit, but a transactional bulk load is hard.

Just out of curiosity, why does this need to be transactional? It's not a normal requirement for a bulk load of any significant size.
 
Abhishek Mish
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Rahul,
Thanks for the reply, even i thought of this solution but the biggest problem in that case is, if data are committed in batch with new transaction for every batch data instead of whole data getting commited in single transaction.
The problem is it will commit data partially if there is some exception scenario as every batch will have new transaction.
Please let me know if i understand correctly.

Thanks,
Abhishek
 
Rahul Babbar
Ranch Hand
Posts: 210
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are a few things...
1) You cannot commit the data in whole transaction because there are too many records and you might(most probably will) get a transaction timeout or any other related DB error.....So, having a single transaction is ruled out..
2) Having multiple transactions: if you get an error in some transactions, you might want to have a strategy for rolling back the changes manually for the previous transactions(if you can somehow track them).....Like if you are inserting data in the table, and know the Ids for which you are inserting the data from the start of the first transaction, you might want to delete all that data which was inserted after a transaction fails.....There migt be other similar workarounds....depending on your requirements....

Currently, i cannot think of any other way...because doing the operation even in a SQL procedure will give the same problems.....
 
Abhishek Mish
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul and Rahul thanks for the suggestion.

Thanks,
Abhishek
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic