Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Batch job  RSS feed

 
Stephen King
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ranchers!!

Inviting your valuable inputs for one of the requirements we have. We need to implement a batch job in java to process approximately 2 million records in a short period of time (max 30 mins).

The data can be made available in flat files (fixed length/delimited) or in XML format(yet to be defined). Each input record needs to be parsed from the input file, processed and the results need to be inserted in database (Oracle). For the input file format, I was wondering if usage of XML would add any overhead from the Performance point of view. Can anyone please comment on the tradeoff using XML or flat-file based inputs?

Also, if you could provide any inputs on the application design (some do's and don'ts), it would be of great help.

Thanks in advance.

Regds,
Steve
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you are sure that the fixed file format is not going to change there is no reason to go to XML. If somebody is likely to sneak in a new data item in the input data later in the project, XML will probably handle it better.

In any case, insertion into the database is bound to be the limiting step, not input parsing of either a flat record or XML using SAX.

If this was my problem I would work backwards from whatever is the best practice for batch record insertion into your database. Whatever it is, figure out a way to support it!

This is the kind of job that multiple threads are ideal for.

Be sure to plan for measuring the time required for various steps so you dont go chasing optimizations that are not related to real problems.

Bill
 
rajesh bala
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not sure if the following approach can help you.

Flat file approach:
==================
1. Have multiple threads, which reads the contents at various points. Say thread#1 reads from 1--1000 and Thread#2 reads from 1001-2001 and so on. Number of records read by the thread should be configurable.

2. Each thread opens a connection to the database and inserts the data into it. Consider using bulk-insert and then do a commit. This will increase performance while inserting.

XML approach:
==================
1. Not sure if you would have a bigggg XML or 2 million small XML files.
2. Consider using STAX parser (pull based parser) for parsing the content to improve performance of parsing large XML.


~Rajesh.B
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!