Win a copy of Java Concurrency Live Lessons this week in the Threads forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

help with batch processing large number of db records  RSS feed

 
Sib Pete
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am working on developing a webapp (visual jsf, ejb3, hibernate on glassfish/mysql) that works with an existing legacy system.

I have an 'employee' table (with fields such as name (String), location (String), etc.) which is separate from an employee table on the legacy db2 side. I do all of the webapp processing with my employee table. However, every week I need to schedule a task to go through all the employee entities in my table and compare them against the employees in the legacy db2 table. If the employee location has changed in the legacy table, I need to update my employee table to reflect the new location.

What would you suggest as the best way to go about doing this?

Currently I am reading in all the employees into an ArrayList and then looping through each employee entity in the list, getting the corresponding legacy employee instance, comparing locations and updating my employee entity if location change detected.

Since I have close to 50000 records in my employee table, the initial build of the ArrayList takes around 5 minutes and this employee number will only keep on increasing.

Any help would be greatly appreciated.

Thanks.
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you put a modify date on your legacy table, and populate it with a trigger? If you can then your update list becomes all records from this table where modify date is in the last week, rather than all records in this table.

Alternatively if you can't, could you spread this operation so it runs throughout the week? A watermark table (e.g. the low and high PK of the last employee records processed) and a periodic task (e.g. event hour or so) could process the table bit by bit rather than in one big go.

One last idea - could you add a stored procedure to your legacy database that can use a cursor to go through this table in chunks? I don;t think DB2 supports a LIMIT of TOP keyword, so I think a cursor is the only way to emulate this behaviour.
 
Sib Pete
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Unfortunately, I cannot modify the legacy table. Otherwise I was thinking of using triggers too. Same reason goes for not being able to add a stored procedure either.

Looks like I will have to use jpa query's "setMaxResults()" and "setFirstResults()" methods to retrieve employee data in chunks. These methods are used for paginating display data in the UI, so I dont see any reason why I cant do the same. This way I can process chunks at a time. And I could probably throw in a queue and mdb for processing the chunks in parallel since I cant create threads within the ejb container.
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Looks like I will have to use jpa query's "setMaxResults()" and "setFirstResults()" methods to retrieve employee data in chunks

I may be wrong, but I don't thing DB2 has any native mechanism to support these features the way you might hope it could. From memory pagination in JPA implementations is done as a post filter for DB2 for this reason, i.e. step one is still selecting everything from the table. Like I say I may be worng, DB2 may have move on. But setMaxResults() and setFirstResults() can't work any other way if your database doesn't support something like SQL Server's TOP or MySQL's LIMIT clause, or has a work around like Oracle does using row numbers.
 
Niv Tyagi
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Not sure about DB2, But i got around a similar problem using MySQL by using LIMIT clause with select.
This worked wonderfully not only i could get the total count but count even page through the result set one set at a time.
Do you know if DB2 has something similar.

 
Bear Bibeault
Author and ninkuma
Marshal
Posts: 65833
134
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"darth vader", please check your private messages for an important administrative matter.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!