Help coderanch get a
new server
by contributing to the fundraiser
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:
  • Lou Hamers
  • Piet Souris
  • Frits Walraven

Huge file processing

 
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have a requirement where i have a property file(Name, Value pair) having 300,000 records and size is suppose 3 GB. I need to persist in the database.
What approach can be taken to effectively persist in the database.

I mean:
1. How to read the file?
2. What collections can be used?
3. How to effectively handle the transaction.

Any views are appreciated!

Thanks in advance!
 
Ranch Hand
Posts: 128
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As the file is that big, reading it all to memory is a bad idea. I'd recommend using a BufferedReader and reading your file line-by-line. Then, for each line you have to extract the name and the value and persist them in a database. No collection would be necessary for that approach. As for the transactions, it depends on your goal - you may want to open and commit a transaction for each line or for the whole file (all or nothing approach).
 
Sheriff
Posts: 22796
131
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you use a PreparedStatement, you can use that one single PreparedStatement for all records:
Or perhaps you can try the addBatch and executeBatch methods. I don't know where that will cache the insert queries though, so it may be as bad as reading the entire file into memory.
 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
BufferedReader is good for reading in a huge file of multiple records.
But if it is a huge file of "one record" (one line concatenated with multiple records) like 3GB to read in, even BufferedReader with -Xms1g -Xmx3g command line java execution option won't be able to process the data.
This is my situation that I have to process a 4.76GB file of one huge line (the file provider has sent us that kind of file concatenated into one line!).
Maybe I will have to split the file to smaller pieces for processing it, then I will have to merge them into one file to send it to the next system.

Is there any other good idea other than splitting it to pieces?
 
Marshal
Posts: 79634
380
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to JavaRanch

You might have done better to start a new thread rather than reopening an old topic.

You will have to get details of the format of the file from whoever supplied it. Is there anything which starts off, or finishes, a record, which is distinguishable from anything else. If there is, can you match it is a regular expression and use a Scanner to read the file?
Is there a record number which increases from record to record?
Are the records in the file of a uniform length? In which case can you read a certain number of characters and call them a line?

I am sure other people will be able to suggest other strategies for parsing your file. If you can't get any of them to work, can you tell the file supplier off for giving you an impossible task?

I would agree with previous comments that it is better to try handling the file one record at a time than trying to handle the whole thing.
 
Young Choi
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you for your attention!
Yes, the file's logical record has a fixed format, and each record size is 215 bytes.
In that case, can I read in 4.76GB file using Scanner class? I did not know about Scanner class until I saw your comment here by the way as I still use SDK1.4.
If so, could you post a sample Scanner usage for handling huge file of one big line like I have?

Thanks again.
 
Campbell Ritchie
Marshal
Posts: 79634
380
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If your file is in bytes, then Scanner probably won't work, I am afraid; it only works on text files.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If the file is fixed binary format, then the obvious approach is to do a binary read into a byte[] of the record size, then pass the byte[] to a method that knows how to unpack it. Let the file system take care of buffering, just work with one record at a time.

Do NOT use a Reader because Readers try to do a character conversion. Scanner also assumes you are working with a text String in a given character set.

You will need a complete record layout to work out how to pick Java values out of the byte[].

Bill
 
Young Choi
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot Bill!
The way you suggested was exactly right.
I used DataInputStream to read the file into a byte[] with fixed (or caculated) length for each chunk, and it works beautiful.
So, reading a file of one huge record (or line) is not a problem any more in Java. Now a remaining tiny issue is about the time to write into an output file with 4.76GB / 215 bytes records. But no quick way there would be I suppose.

Do appreciate again!
 
Campbell Ritchie
Marshal
Posts: 79634
380
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Writing to a file is just like reading, only backwards. If you used readers before use writers (the FileWriter has a constructor which takes a boolean allowing you to append the text to the end of the file). If you used a XYZInputStream before, try an XYZOutputStream. You should find initial hints in the Java™ Tutorials.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OK - IF you are both reading and writing records, it may be profitable to add your own buffering. We dont want the physical disk to be chasing back and forth between the reading and writing areas.

See java.io.BufferedOutputStream - note you can construct a huge output buffer, much larger than the default operating system one. Be sure to make it a multiple of the record size.

The java.io package presents many elegant demonstrations of the "Decorator" design pattern, making it easy to do some time trials with and without a huge output buffer.

Do some time trials and let us know how much difference it makes.

Bill
reply
    Bookmark Topic Watch Topic
  • New Topic