• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Searching and Updating a CSV file

 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi. My first time here and I'm hoping I can get some advice with the following.
To begin with, I have to perform merge/purge processing on joined tables in an Oracle DB (8.1.7) on AIX. However, I have to use a third party tool an NT server(Sagent's Merge/Purge product) to actually do the merge/purge.
The joined tables create roughly 6 million records that represent about 9.6 GB of data.
My approach has been to create a stored java procedure in Oracle to create a .csv file on the AIX system. This takes nearly 8hrs. I then FTP the file over to the NT server on which the merge/purge applications exists and run the application against the .csv file I just transferred. Running Sagen'ts merge/purge against the flat file is very fast and produces output files containing matching records.
What I need help with is
1. Is creating the .CSV and doing and FTP the best approach and if so, how can I improve the performance.
2. I want to avoid downloading the .csv file to the NT server any more than necessary and would like to update the .csv file with the output from the merge/purge results after updating the Oracle tables. So, I need to be able to search, insert and delete records in the .csv file. In other words I'd need to keep the Oracle tables and .csv sync'd. So I would like some advice on the best way to handle this if anyone has any experience with this sort of thing.
FYI: After create the result, I loop through the resultset and add a record at a time using
BufferedWriter writer = new BufferedWriter(new FileWriter("d:\\vicbatchload.txt")4000);
while (CustomerResultSet.next()) {field1 + "," + field2 + ","....)
Thank you,


[ November 17, 2003: Message edited by: Bill Earnhardt ]
 
Ranch Hand
Posts: 336
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just a few thoughts:
Have you considered creating the .csv file on the NT box. You run your java thing on the NT box and connect, with JDBC, to your AIX.
You are using concatenation as in filed1 + "," + ... Consider to send em separately:
print filed1
print ","
print filed2
print ","
In any case 8 hrs is just too much... (but you know that )
 
Bill Earnhardt
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you for the quick response. Unfortunately, the data is within Oracle and Oracle provides their own (fully Sun compliant) JVM. Running a stored procedure within Oracle will produce the best performance. Having said that, there may be a better approach than the one I'm using. You're suggestion of using print instead of concatenation sounds good.
Is there a practical size for the output buffer that might help reduce overall I/O. I can't tell much if any difference when I increase the size.
BufferedWriter writer = new BufferedWriter(newFileWriter("/employee.csv")4000);
 
Leslie Chaim
Ranch Hand
Posts: 336
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, Oracle's internal JVM is certainly more efficient. However, in your case it sounds to me that you are only extracting data and not doing any computations or insert/delete etc. I don't think you gain much from using Oracle's JVM. Furthermore, the whole point of Oracle's JVM is for database operations and in your case you are accessing the external file system.
In any case, it should not be that complicated to try a run from your NT box.
Let us know..
 
Bill Earnhardt
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Leslie, thank you for the input and you make a good point regarding how I'm using the oracle JVM. I'll try calling the data from the NT box. I't can't be any worse.
Regards,
 
Leslie Chaim
Ranch Hand
Posts: 336
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, to 'sync up' with your .csv file I am not sure if such a fruit exists...
You can use a JDBC driver to work with your .csv file. I know that Perl has with the DBI module. I belive that you can do it with JDBC. I just don't have all the details.
Having mentioned Perl you may want to consider... Especially, in your case which sounds like a one-time-only thing. Sorry Jim I just had to authenticate
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry Jim
No problem; you know I would have worried otherwise.
 
Bill Earnhardt
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Leslie, thank you again. As I delve deeper into the use of JAVA the more powerful I find it to be. Regardless of whether I can achieve better performance in creating the .csv file, I've gained a lot of good experience and advice in the process.
I'll keep you posted and if I find anything that might be particularly interesting, I'll pass that along as well.
Regards,
Bill
 
Bill Earnhardt
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
After several attempts to find a way to get the performance I was looking for, I realized that the fastest and easiest way was to just spool the output to a file.
Oracle's sql plus has a number of switches that make this very efficient. If anyone has any interest in how do this, I'll be glad to post the script.
Thanks again....
Bill
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic