• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Data Loading Strategy

 
Tom Purl
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a general question about reading large amounts of data from a database using Java.

Basically, I am trying to convert a very large amount of data from one data base to another. The data formats are somewhat non-standard, so I have to write a program to do the data transformation.

In the past, I have follwed this basic alogrithm when converting data:

1. Read all of the data from the source into RAM (into some sort of data structure).
2. Validate/manipulate data in the data structure.
3. Read each element in the data structure and write it to the new data source.

Step 1 is no longer a feasible option. Unfortunately, I really don't know of any other way of doing this. Could someone please point me in the direction of a resource or give me a decent idea?

Thanks a ton!

Tom Purl
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 65123
91
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is the data transformation dependent upon having all the data in memory? If not, why not just transform a record at a time?
 
Tom Purl
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, the transformation isn't dependent on having all of the data in memory. My program currently transforms one record at a time. It converts the ResultSet records into an object first to do some validation and manipulation.

My problem is that I need to "chunk" my data retrieval from the database. I can't possibly read all of the data at once and store it into a single ResultSet. However, I've never had to do something like this before and I'm looking for some best practices.

Has anyone seen anything like this on a website or in a book?
 
Pradeep Ram
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tom,
I had worked on something of a similar nature. We were evaluating some tools that could do the job for us and also write our custom code (java) to do the same.

We had adopted a mixture of approaches. We had to get data into a composite repository from different data sources (not necessarily RDBMS).

The following is the list of solutions that we had used
1. For smaller pieces of data from a file source, usage of JMS queues to convert the records into messages and have the DBRecordBuilder to convert these messages to SQL format and perform inserts (works well for small messages + its asynchronous)
2. Migration from exisiting database - We had written custom classes that read about 350 records in one go and work on the data (using a custom DataValidator class) and insert each record back into the target database/tables. This approach worked well, as we could perform action on each record and flag those records that could not be processed. But it took a while to process (this was ok with us as it was a one time port; we did this for about 1.5 million records -- we had this clocked under 2 hours)

3. This was an experimental strategy that we tried, but never got around doing it. The purpose was to create a flat file data for each of the table that we had wanted to insert into (using Stored procs for validation) and Perl to get the results to flat files and using SQLLoader (Oracle) to process the files and insert the information in the target database.-- we had some issues with SQLLoader which we could not get around fixing it in the short time...so we dropped this idea.

4. Using XML (I think u must have disucsses this solution) , but its painstaking and a waste of XML processing. use it onloy if you need to send the data to another source who does not have a defined format -- a definte overkill
5. If your taget and source database is Oracle, try using PRO*C for performance . If it is really necessary to use Java here, then try one of the 3 approaches... I am not sure if there is a pattern here for such large volume, am investigating it for another project that I am working on..will let you know if I come across something ...
--Pradeep N Ram
 
Tom Purl
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks a ton for your help, Pradeep!
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You don't specify the backend database you are using. Sybase ASE lets you limit the number of rows you get with the set rowcount command (set rowcount 1000). If so you can always query anything greater than the greatest primary key value of the previous batch and get rows say 1000 at a time until you've processed them all.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic