Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Finding rowcount for large files, Reading pipe delimited file  RSS feed

 
Kasi Viswan
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Is there an easy way to get the number of rows in a large file (like 500,000 rows) before i iterate over the entire file. it would help me in assigning the maximumvalue for a progress bar.

I am doing it by reading the file with FileReader, then wrapping it by CSVreader to read it. I use CSVReader to read and parse a pipe delimited file and it works fine.
Is there a better way to do it. String.split() didn't work for some rows when running on a lengthy file, it didn't return the same number of elements in the returned string array for all the rows.

Thanks
Kasi

 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
With a text file where each line can have a different length, the only way to really know how many lines there are is to read each and every character to find out if it's a newline. There may be faster ways to do this than what you're currently doing. However, if the only reason for this is to calibrate and update a progress bar, it's probably simpler to forget about lines, and count bytes instead. You can get the number of bytes in a file with a simple call to the length() method in File. And to count the bytes as you read them, there are several approaches. I would create a Checksum implementation, and use it with a CheckedInputStream to count the bytes as they are read. You can see an example of this here. That example uses a CheckedOutputStream rather than a CheckedInputStream, but the code would be very, very similar to what's shown.
 
Kasi Viswan
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is the full requirement.

Am suppose to read data out of a pipe delimited text file.
This text file is going to be exported from an excel worksheet so its guaranteed to have the same number of columns but not guaranteed to have values in all the columns. so it can be like

Name|AccountNo|TransactionId|TransactionAmount|Date|Memo
John|100010|50001|20000|052308|cc payment
George|100050|50002|20000|034521|
...
...

After reading every row, there is some lengthy process to be done.

Am using a Swing interface.
The users would like to have some information on the progress of the task being done.

I don't know if i can use input stream for this case, i decided to go with reader as i am going to read the entire line then parse the line to string array using CSVReader.

I know i can count the number of line read and display it to the user as nth record being processed and so on but thought would be nice to show the user a progress bar that also gives the user an idea as to how much more time it would take to complete the task and for the progress bar i need to know the number of lines in the file beforehand.

Thanks for your help.

Please advise.


 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think your response changes my answer significantly. But here are some clarifications:

Kasi Viswan wrote:I don't know if i can use input stream for this case, i decided to go with reader as i am going to read the entire line then parse the line to string array using CSVReader.


Well, I'm pretty sure you can still use an InputStream. You just have to figure out how to insert it into the mix of different streams you use to read the data. Some ideas:

or


Alternately, it wouldn't be hard to count characters instead of bytes. You could extend FilterReader to make a CountingReader, much like the ByteCounter shown elsewhere. Insert that into your chain of Readers, and use it to count characters read.

Kasi Viswan wrote:I know i can count the number of line read and display it to the user as nth record being processed and so on but thought would be nice to show the user a progress bar that also gives the user an idea as to how much more time it would take to complete the task and for the progress bar i need to know the number of lines in the file beforehand.

Sure. You can count lines, characters, or bytes. All are fairly easy. But for a really big file, counting lines will take some time before you can display anything meaningful to the user. Whereas counting bytes or characters can be done right away, usually. Exception: if you're using UTF-8 with any non-European languages, it's hard to estimate the total number of characters in a file without reading the whole file. But for typical English-language files, it's easy.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!