• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Processing a large comma-separated file

 
Ranch Hand
Posts: 351
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Friends,

I need to read a very large comma-separated file (.txt or .csv - depending on requirement) and want
to be able to skip a few records if they are already been processed. Like if for some reasons the file
is processed halfway, I need to be able to set the counter to the location from where to begin processing
the next time.

If I read the file using SQL, in one example I saw
strSQL = "select * from " + filepath was executed and the resultset was traversed. This way I will be able
to move the record counter easily using the resultset methods.

I wanted to know if anyone has used this appraoch and if this is an efficient way of reading
a large comma-separated file.

Anyone has a better idea or can make me aware of any flaws of the above approach, it will be nice.

Regards,
Leena
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have not used a CSV/JDBC driver, but I would think that there is a significant overhead. Working directly on the file level might be more performant. There are ready-made helper classes like the Ostermiller CSV class which help with the reading of the file.
 
Ranch Hand
Posts: 531
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sounds like you want random access to the file (e.g., skip first half) and my inclination would be to use a RandomAccessFile. Memory mapped file facilities in Java are excellent.
 
Leena Diwan
Ranch Hand
Posts: 351
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello All,

I have done some trial and error on this. I tested the OsterMiller utility, while loop with StringTokenizer
and while loop with my own token separator.

I checked the time and free memory.

Time taken by taking system time before and after processing. Then taking the difference.
And free memory by taking - Runtime.getRuntime().freeMemory() before and after processing.

My comma separated files are gonna be huge so I dont want a logic that takes up a lot of memory.
Is this free memory a proprer major of checking if lots of memory id being used or not? Is it foolproof
in all the scenarios?

I feel the utility is actually requiring a lot of memory. Lot of memory gets allocated and that results
in a bigger difference when free memory diff is calculated.

Anyone has good experience with the utility from performance point of view?

Regards,
Leena
 
Did you ever grow anything in the garden of your mind? - Fred Rogers. Tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic