• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Adding millions of entries in database

 
Bogdan Baraila
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,

My application will need to parse some files which will generate more then 1 milion of records in the database.
Using StatelessSession from hibernate and postgresql i've reach the time of 70 second for 1 million records (an entity has 4 characters column and the bigint primary key).

I have 2 questions:
1) Since I didn't work with so many data until now, do you think that this is a good time?
2) Do you have any suggestions to improve this time?

Thanks.
 
William P O'Sullivan
Ranch Hand
Posts: 859
Chrome IBM DB2 Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1. 70s for 1 million records is not bad at all. Is this just the parsing process? This is just over a minute!

2. See #1

WP
 
Bogdan Baraila
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
William P O'Sullivan wrote:Is this just the parsing process?
WP


For now i have just created one million object similar with what i will have from the files and saved them. I'm just concerned about the inserting into database process. The reading from the file will be fast. If there are any problems i can use multithreding for the parsing, but the database insert (Java object -> database row process) is usually the bottleneck.
 
Manuel Petermann
Ranch Hand
Posts: 177
Hibernate Linux Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bogdan Baraila wrote:...and saved them. I'm just concerned about the inserting into database process...

Maybe its just me but I still dont get it. Is this time for parsing a file, saving it to database or both of it?
 
Bogdan Baraila
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Like the title says is just about the inserting of 1 million entities in the database.
 
William P O'Sullivan
Ranch Hand
Posts: 859
Chrome IBM DB2 Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you want help, you need to clarify:

Is your 70s(econds) timing for the actual insert into Postgres (and committing) of 1,000,000 entries ?

You claim to be processing from memory only, so yes your file i/o will add some overhead.

Be clear please.

WP
 
Bogdan Baraila
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes 70 seconds it's just the time of creating the objects in memory and saving them in my database.
 
William P O'Sullivan
Ranch Hand
Posts: 859
Chrome IBM DB2 Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is that in-memory? It is still very very fast?

I would suggest modifying your code to create a test file from your 1,000,000 records.

Then modify again to read that file, and see what your baseline timing is.

WP
 
Bogdan Baraila
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes It's in memory. I'm just instantiating the object into a for, setting them some values in function of the for index and save them into database.
I know that the file processing will add some extra time, but now i'm more concerned about the saving in database process (i have lots of ideea of how i can speed up the file processing but for the database saving this is all i have got until now).
 
Rajit vreddi
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I suggest use batch process and split updating few records each time on iteration.
 
Bogdan Baraila
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using multithreading and StatelessSession i have now obtained a time of 35 seconds for 1 millions records. It can go even lower if i use more threads but for now is enough
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic