Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

1 Million of records insertion using JDBC Batch in Oracle

 
mozammil muzza
Ranch Hand
Posts: 32
Eclipse IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am trying to run application that inserts 1 million of records into the DB table with 7 columns and with 1 PK, 1 FK and 3 Unique index constraints on it.

I am using PreparedStatement and JDBC Batch for this and on every 2000 batch size i runs executeBatch() method.

here, for half millions of records it is taking almost 3 mins i.e. 182 secs.

Please tell me can we have any other faster way of inserting the data into DB.

thanks
 
Maneesh Godbole
Saloon Keeper
Posts: 11092
13
Android Eclipse IDE Google Web Toolkit Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hardly a beginners topic. Moving to a more appropriate forum
 
Winston Gutkowski
Bartender
Pie
Posts: 10498
64
Eclipse IDE Hibernate Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
mozammil muzza wrote:I am trying to run application that inserts 1 million of records into the DB table with 7 columns and with 1 PK, 1 FK and 3 Unique index constraints on it.
I am using PreparedStatement and JDBC Batch for this and on every 2000 batch size i runs executeBatch() method.
here, for half millions of records it is taking almost 3 mins i.e. 182 secs.

Well,

First: have you considered using a native DB load facility?

Second: Have you considered re-designing your table? 4 unique indexes (including, presumably, PK) on a table with 7 columns suggests to me that it isn't properly rationalized.

Third: I suggest you look at this thread and, in particular, Jayesh's response; because it's pretty much what I'd advise.

Winston

[Edit] Updated. Thanks Jeff.
 
mozammil muzza
Ranch Hand
Posts: 32
Eclipse IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Second: Have you considered re-designing your table? 4 unique indexes (including, presumably, PK) on a table with 7 columns suggests to me that it isn't properly rationalized.

I tried the same in other DB table with same structure, there it is taking hardly 37 secs for 600K records, thats why i am bit confused on it.
 
Winston Gutkowski
Bartender
Pie
Posts: 10498
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
mozammil muzza wrote:I tried the same in other DB table with same structure, there it is taking hardly 37 secs for 600K records, thats why i am bit confused on it.

Me too, because you haven't given us any information that could help. Notwithstanding, my point stands: I've been dealing with databases for more than twenty years and can count on the fingers of one hand tables that rationally required 4 unique indexes; and none of them had only 7 columns.

But maybe you can prove me wrong.

Winston
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Third: I suggest you look at this thread and, in particular, Junilu's response;


Jayesh's response perhaps?
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd say Winston has already raised the main points (that is, to consider using a data load tool, in this case SQL*Loader or perhaps external tables), and to review the table and its indexes (index maintenance is quite expensive, as compared to the insert operation itself).

So it's just the minor points left:

1) Do some experiments to find which batch size works best in your environment. 2000 could even be too much.

2) Make sure you're using the latest JDBC driver for Oracle. Oracle has made some performance enhancements in JDBC driver batch processing which came with the 11g database version (looking just at the filename is not enough, various versions of the driver are named ojdbc5.jar or ojdbc6.jar).
 
mozammil muzza
Ranch Hand
Posts: 32
Eclipse IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeff Verdegan wrote:
Winston Gutkowski wrote:
Third: I suggest you look at this thread and, in particular, Junilu's response;


Jayesh's response perhaps?


i do not have provision for using sqlloader in my apps. So that thread is not really much helpful.


Martin Vajsar wrote:I'd say Winston has already raised the main points (that is, to consider using a data load tool, in this case SQL*Loader or perhaps external tables), and to review the table and its indexes (index maintenance is quite expensive, as compared to the insert operation itself).

So it's just the minor points left:

1) Do some experiments to find which batch size works best in your environment. 2000 could even be too much.


I tested with several batch sizes, but did not achieved the level what i want. Even I tried Oracle Batching Mechanism over it, but not much improvement.

2) Make sure you're using the latest JDBC driver for Oracle. Oracle has made some performance enhancements in JDBC driver batch processing which came with the 11g database version (looking just at the filename is not enough, various versions of the driver are named ojdbc5.jar or ojdbc6.jar).


I am using latest jdbc driver 11.x.x and checked in its Manifest file BTW.


Any thing else can i try, thanks for your reply.
 
Carles Gasques
Ranch Hand
Posts: 199
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Some path to explore is drop the index / indexes, insert all your data and then rebuild the index / indexes.
That will give you a boost.

Cheers,
 
Winston Gutkowski
Bartender
Pie
Posts: 10498
64
Eclipse IDE Hibernate Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carles Gasques wrote:Some path to explore is drop the index / indexes, insert all your data and then rebuild the index / indexes...

That occurred to me too; and it would definitely have been my advice if the indexes weren't unique - but one presumes they are unique for a reason (ie, probably as an automatic validity check).

If you drop, load and rebuild, you have two basic issues:
1. Values aren't verified on insertion.
2. Rejected index values are not likely to be in any predictable order (particularly as there are 4 indexes to satisfy).

My advice is still to look at that table's definition: it sounds "smelly" to me.

Winston
 
Carles Gasques
Ranch Hand
Posts: 199
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I agree with your diagnostic Winston.

But at least giving a try to rebuild the indexes before load the data and check performance evolution will not do any harm.


Cheers,
 
mozammil muzza
Ranch Hand
Posts: 32
Eclipse IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Carles Gasques wrote:Some path to explore is drop the index / indexes, insert all your data and then rebuild the index / indexes...

That occurred to me too; and it would definitely have been my advice if the indexes weren't unique - but one presumes they are unique for a reason (ie, probably as an automatic validity check).

If you drop, load and rebuild, you have two basic issues:
1. Values aren't verified on insertion.
2. Rejected index values are not likely to be in any predictable order (particularly as there are 4 indexes to satisfy).

My advice is still to look at that table's definition: it sounds "smelly" to me.

Winston


Well, indexes are unique one and i can't drop it before inserting records as they are dependent on it. It makes the data inconsistent.

 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
mozammil muzza wrote:Well, indexes are unique one and i can't drop it before inserting records as they are dependent on it. It makes the data inconsistent.

Unfortunately it looks like you've got conflicting requirements. If you do need to improve the insertion speeds, you may have to make compromises elsewhere.

Perhaps, if your database has enough resources you could use, you might insert the rows into a temporary (unindexed) table and then merge/inset them into the target table in parallel.

Since you've ruled out the obvious measures, it would be good to find out what the bottleneck of your current processing is and target that bottleneck specifically. Do you have some experience with performance tuning in Oracle?
 
Carles Gasques
Ranch Hand
Posts: 199
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree of course,
but give a try to rebuild the indexes and then run the load code again.
Perhaps the indexes are too much fragmented and unbalanced, rebuild them could make a difference.


Best reggards,
 
mozammil muzza
Ranch Hand
Posts: 32
Eclipse IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Martin Vajsar wrote:

Perhaps, if your database has enough resources you could use, you might insert the rows into a temporary (unindexed) table and then merge/inset them into the target table in parallel.

I tried load insert as well using stage table, still not much improvement.
can you please elaborate about merge/insert and its processing in parallel, not sure, but guess it might be helpful.


Since you've ruled out the obvious measures, it would be good to find out what the bottleneck of your current processing is and target that bottleneck specifically. Do you have some experience with performance tuning in Oracle?

Not really.
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
mozammil muzza wrote:I tried load insert as well using stage table, still not much improvement.
can you please elaborate about merge/insert and its processing in parallel, not sure, but guess it might be helpful.

Employing parallel processing makes sense only if your database has plenty of available CPU and IO resources during your loads. If your parallel processing competes with other processes running in the database, it will only make life miserable for everyone (including you). Also, you might need some special privileges and/or quotas to be able to run parallel jobs, ask your DBA if in doubt (your DBA, assuming there is one, should know what you're going to do in the database. Parallel processing is usually not expected from a JDBC application.)

What I'd try:

1) Create a temporary table (probably with the on commit preserve rows option). Temporary tables generate slightly less undo, which is good, of course. Let me note - in case you haven't worked with Oracle's temporary tables yet - that the table should be created upfront and be part of your database schema, you don't create it as part of your data loading process. No indexes.

2) Before populating the table, truncate it to make sure it doesn't contain rows from previous runs (should be an instantaneous operation).

3) Populate the table using batching (no commits in between, of course), and using the APPEND_VALUES hint. Commit after the table is populated (it is necessary because of the APPEND_VALUES hint). Disclaimer: I'm not sure the APPEND_VALUES hint will have any effect when inserting into a temporary table, but personally I'd just try it with and without, and see what is faster.

Now, if the steps 2 and 3 don't take significantly less time than your current processing, it is not worth going further. Just halving the time might not be enough, in my opinion. We'll still have to copy all of the data into the target table, and that's gonna take some time even in parallel.

4) Insert data into target table (single insert into ... select statement), employing parallel processing. See this StackOverflow article for details, I haven't ever used parallel insert myself.

If there is a unique index violation, the insert will fail, and you won't have any indication which rows/values are the cause. But at least you'll have the option to rollback the transaction, and/or identify the duplicate values by querying the target and temporary table.
 
jake lopez
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
make your jdbc connection not autocommit. and commit every batches.
 
Arun Giridhar
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This post might be really helpful to my current open source prj . I will be using Postgres , i'm dealing with 3 Lakh records per 6 months .
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic