Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

mysql jdbc batch inserts and updates into different tables

 
Aadil Shaik
Greenhorn
Posts: 25
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello:

I need to insert millions of records into mysql database. I want to use the batch execute option. But my SQL statements are for different tables and they should execute in the same sequence.

Table 1 insert 1 record
Table 2 insert 1 record using the Table 1's id (key)
Table 3 insert 20 records using the Table 2's id
Table 1 insert 1 record
Table 2 insert 1 record using the Table 1's second record's id
Table 3 insert 20 records using the Table 2's second record's id

right now i am executing one statement at a time. How to change this into batch execution to save time.

Thanking you Aadil
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aadil Shaik wrote:Hello:

I need to insert millions of records into mysql database. I want to use the batch execute option. But my SQL statements are for different tables and they should execute in the same sequence.

Table 1 insert 1 record
Table 2 insert 1 record using the Table 1's id (key)
Table 3 insert 20 records using the Table 2's id
Table 1 insert 1 record
Table 2 insert 1 record using the Table 1's second record's id
Table 3 insert 20 records using the Table 2's second record's id

right now i am executing one statement at a time. How to change this into batch execution to save time.

Thanking you Aadil


There's a few ways of doing this, and all of them depend on what's best for your specific scenario. Nonetheless, the easiest method would be to use stored procedure and LOAD FILE to read the records into the database. For millions of records, an external procedure (java) would be a bottleneck.
 
Ron McLeod
Bartender
Pie
Posts: 1049
65
Android Eclipse IDE Java Linux MySQL Database Redhat
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aadil Shaik wrote:Table 1 insert 1 record
Table 2 insert 1 record using the Table 1's id (key)
Table 3 insert 20 records using the Table 2's id

Are the ids for Table 1 and Table 2 generated by the database or provided by the application?
 
Aadil Shaik
Greenhorn
Posts: 25
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ids are the sequence numbers generated by the database.
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aadil Shaik wrote:right now i am executing one statement at a time. How to change this into batch execution to save time.

Is there any possibility to insert the records table per table? So first insert all records into table 1, then insert all records into table 2 and finally insert all records into table3?
 
Aadil Shaik
Greenhorn
Posts: 25
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Roel De Nijs wrote:
Aadil Shaik wrote:right now i am executing one statement at a time. How to change this into batch execution to save time.

Is there any possibility to insert the records table per table? So first insert all records into table 1, then insert all records into table 2 and finally insert all records into table3?


I need to execute the insert statements in a sequence.

I am thinking about doing this way.



May be not a good option but i will continue to think for the best solution.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aadil Shaik wrote:
Roel De Nijs wrote:
Aadil Shaik wrote:right now i am executing one statement at a time. How to change this into batch execution to save time.

Is there any possibility to insert the records table per table? So first insert all records into table 1, then insert all records into table 2 and finally insert all records into table3?


I need to execute the insert statements in a sequence.

I am thinking about doing this way.



May be not a good option but i will continue to think for the best solution.


That is not a very good method. The new id might not be the max, for a number of reasons. And even one mistake can mess up a lot of records.

Instead, use LAST_INSERT_ID(), which is session specific.
 
Ron McLeod
Bartender
Pie
Posts: 1049
65
Android Eclipse IDE Java Linux MySQL Database Redhat
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using Statement.getGeneratedKeys() will save a round-trip to the MySQL database - for example:
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ron McLeod wrote:Using Statement.getGeneratedKeys() will save a round-trip to the MySQL database - for example:


Better to do the batch and make no round trips (*or as few as possible).
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aadil Shaik wrote:I need to execute the insert statements in a sequence.

And is there any specific reason why these statements have to be executed in that sequence. Because the table-per-table option is probably the easiest solution to insert all records in batch (although using LAST_INSERT_ID() as suggested by Brian will do the job as well).
 
Aadil Shaik
Greenhorn
Posts: 25
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Roel De Nijs wrote:
Aadil Shaik wrote:I need to execute the insert statements in a sequence.

And is there any specific reason why these statements have to be executed in that sequence. Because the table-per-table option is probably the easiest solution to insert all records in batch (although using LAST_INSERT_ID() as suggested by Brian will do the job as well).


The reason i need to execute the statements in sequence is because of the hierarchical data.



I get this data in a text file. It is in X12 format. I need to insert that data into database tables which have the foreign key relationships.
I need to use the database generated keys (ids) to link between the tables.
One way is execute statements one at a time, get the parent id and use it while creating the child record. But that process is very slow.
Another way is to use the batch execution mode. Prepare the sql statements and add them to the batch and execute once. But the problem is with the keys.
 
Aadil Shaik
Greenhorn
Posts: 25
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:
Ron McLeod wrote:Using Statement.getGeneratedKeys() will save a round-trip to the MySQL database - for example:


Better to do the batch and make no round trips (*or as few as possible).


I too want to use the batch execution mode. But the problem is with the keys. Unless i know the parent key i can't insert the child record. But my data is like
parent
child
parent
child
parent
child
.
.
.
.

I need to go back and forth with each insert statement.
 
Dave Tolls
Ranch Hand
Posts: 2110
16
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you tried Brian's suggestion, using LAST_INSERT_ID() in your child and grandchild queries?

Though, thinking about it, I'm not sure how you'll track the ids down the chain.
You can do it for children, but the grandchildren might be an issue, looking at the documentation Brian linked to.

MySQL remembers the last insert id.
So, insert parent and LAST_INSERT_ID() will return that id for the inserts of the first child. However, it will then hold the ID of that child.
OK, we can fix that by using the IGNORE keyword, but even then (and is that actually advisable?) you have the issue of the grandchildren, so you need to child ID.

I might be misreading the docs here, so maybe Brian can expand on how he thinks this should be used.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:Though, thinking about it, I'm not sure how you'll track the ids down the chain.


Variables.
 
Dave Tolls
Ranch Hand
Posts: 2110
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:
Dave Tolls wrote:Though, thinking about it, I'm not sure how you'll track the ids down the chain.


Variables.


Well, yes, but I thought we were talking batching here, as in the addBatch method.
Can't use a variable there as the batch hasn't run.

Personally, as you suggest right at the start, I wouldn't even consider Java as a suitable thing for this sort of massive data load.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:
Brian Tkatch wrote:
Dave Tolls wrote:Though, thinking about it, I'm not sure how you'll track the ids down the chain.


Variables.


Well, yes, but I thought we were talking batching here, as in the addBatch method.
Can't use a variable there as the batch hasn't run.


SELECT LAST_INSERT_ID(); would return the value to the local batch.
 
Dave Tolls
Ranch Hand
Posts: 2110
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, so what are your batches doing?

I'd be interested to see a sample.

Take the OPs problem:


As I said, I can see how LAST_INSERT_ID would work for the first child, and the first grandchild, but not the second.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:OK, so what are your batches doing?

I'd be interested to see a sample.

Take the OPs problem:


As I said, I can see how LAST_INSERT_ID would work for the first child, and the first grandchild, but not the second.


I do not have mysql installed, nor enough experience with Java. If returning to Java between actions, there should be a query that simply returns the value as if it were a table's column: SELECT LAST_INSERT_ID(); would create a result set that can be read in by the java code and stored in a variable. This variable can be used in subsequent queries via placeholders in the prepared statement.

 
Dave Tolls
Ranch Hand
Posts: 2110
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The only thing that will do that is the sequence of INSERT statements with retrieving generated keys.
Well, you could do a separate SELECT, but that would involve more round trips (INSERT, SELECT, INSERT etc).
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:The only thing that will do that is the sequence of INSERT statements with retrieving generated keys.
Well, you could do a separate SELECT, but that would involve more round trips (INSERT, SELECT, INSERT etc).

Exactly. And that is another reason why doing it completely in the database is a better idea.

FWIW, Oracle has RETURNING clause for INSERT statement, which removes the need for a separate statement. And, it supports bulk inserts too.

Though, on second thought, the return to java might not be required. The second insert would also generate a id, which itself can be used to grab the first id inside the insert statement.
Where column tab1 is the FK to tab1 from tab2.
 
Dave Tolls
Ranch Hand
Posts: 2110
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think Oracles RETURNING mechanism is what its JDBC driver uses for the generated keys functionality, under the hood. Though don't quote me on that!


Still not convinced re: the use of LAST_INSERT_ID in JDBC batches for this particular problem...but I haven't got anything to try it out on, so it's a fair bit of "gut feeling" on my part, and that's not always reliable.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:I think Oracles RETURNING mechanism is what its JDBC driver uses for the generated keys functionality, under the hood. Though don't quote me on that!

Oops!

Dave Tolls wrote:Still not convinced re: the use of LAST_INSERT_ID in JDBC batches for this particular problem...but I haven't got anything to try it out on, so it's a fair bit of "gut feeling" on my part, and that's not always reliable.

Without seeing the DB and code, it's hard to tell. I'd opt for an data import, if possible. That is always the fastest option.
 
Aadil Shaik
Greenhorn
Posts: 25
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thank you all for your responses.

This is how i implemented, 80% batch execution and 20% individual execution.
Where ever i need to retrieve the id from the database its an individual execution and for the statements which can run independently and don't need their keys are batched.
It is about 4 times faster than the all individual execution.
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aadil Shaik wrote:This is how i implemented, 80% batch execution and 20% individual execution.
Where ever i need to retrieve the id from the database its an individual execution and for the statements which can run independently and don't need their keys are batched.

So I assume all parent and child records are inserted individually and all grand child records are inserted in batch...
 
Aadil Shaik
Greenhorn
Posts: 25
1
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Roel De Nijs wrote:
Aadil Shaik wrote:This is how i implemented, 80% batch execution and 20% individual execution.
Where ever i need to retrieve the id from the database its an individual execution and for the statements which can run independently and don't need their keys are batched.

So I assume all parent and child records are inserted individually and all grand child records are inserted in batch...


Yes. all parent and some of the child records are inserted individually so that i can get the keys in the java program and the child records for which i don't need keys and all grand child records are executed in batch mode.
But later i will use a technique to generate global unique keys within java that way i can do 100% batch.
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aadil Shaik wrote:Yes. all parent and some of the child records are inserted individually so that i can get the keys in the java program and the child records for which i don't need keys and all grand child records are executed in batch mode.

Thanks for coming back and sharing your solution. That might be (very) helpful for other ranchers Have a cow!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic