• Post Reply Bookmark Topic Watch Topic
  • New Topic

Returning the last line in a file.  RSS feed

 
Chris Ramsey
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all,
I am attempting to return the last line of a file. Ultimately I would like to read in any specific line that I choose. Performance is a must so I want to know if there is a faster way to do this. Also, with my approach, if the last line is a blank line the results are incorrect. If not then it seems to work. The other approach I was considering was maybe storing each line in a record and then just reading the last record. I guess that would be faster but I don't know how to go about implementing it. Anyway any suggestions are appreciated. Thank you.


[ December 12, 2003: Message edited by: Isaiah Brown ]
[ December 12, 2003: Message edited by: Isaiah Brown ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Performance is a must so I want to know if there is a faster way to do this.
Well, there are going to be a number of different options depending on how you're using this, and how much time you want to spend on more complicated algorithms. I suggest trying the simplest solutions first, then moving to more complex stuff if the performance is not good enough.
How big are the files you want to read? How much memory do you have? Would it be feasible to store all the lines of the file in RAM as an ArrayList of Strings, for example? This is pretty simple:

Now you can easily access any line you want to. The only down sides are, it may take too much memory, and also it takes a bit of time to load everything in memory at the beginning. Once it's loaded though, you've got very fast access.
I note that you're already loading all your data into a byte[] (or trying to), so I gather memory isn't too much of a problem. Yet.
To save memory, instead of storing everything as Strings, you could just store a long representing the offset in bytes of each line. This would be similar to what you've already coded - but you save every line's posiiton, not just the last one. Then when you need to retrieve a particular line, you can skip() (or seek()) to that position. You could even save even more memory by saving every 10th offset, or every 100th. So tp read line 3745, look up the position of line 3700, jump there, and read 45 lines to get to 3745. Obviously this is more complex, and trades performance for memory - but it's a possibility.
If you just want to read the last line, there are faster methods to do this, where you start reading bytes from the end, not the beginning. (Need a RandomAccessFile or FileChannel for this, not an InputStream.) In this case you never need to read most of the file at all. But it's only useful for the last line, or the second to last line, or the nth to last line. If you want line 3745, unless you knw in advance that there are really only 3747 lines and thus you can read 3 lines from the back - you probably need to read 3745 lines from the beginning. Because there's usually no way of knowing the line count without looking at the whole file, to see how many \n there are. So these special algorithms for reading lines from the end are of limited use, and they're rather complex, so I doubt you want to do this unless performance is really a problem.
[B][/B]
The APIs for RandomAccessFile and InputStream do not guarantee that the method will actually read the full length of bytes you have requested. So you need to check the return value and loop to make sure you've really filled the array:

Why was this different on RandomAccessFile? Well, you were lucky. Neither class guarantees a full read, but for the implementation and platform you were using, and for the particular file and I/O hardware you were dealing with, RandomAccessFile delivered a full read while FileInputStream did not. But in another situation, you might get very different results. So you always need to check the return value and do some sort of loop to make sure you've read everything you needed.
Or, RandomAccessFile also offers a different method, readFully(byte[]). This does exactly what you'd expect. But it's not available on FileInputStream. Your choice.
 
Chris Ramsey
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wow! Thank you very much for your reply. I am going to try implementing your suggestions. Performance is really a requirement because the files I am processing very large. Let me give more detail about what I am doing. Each line in the file will have pipe delimited fields. The first field in each line describes what type of line it is.
ie.
TYPEA|DATA|DATA|
TYPEB|DATA|DATA|
Each line must end with a pipe. The objective is to parse the fields then display them on a textfield. So, when the program starts I want to process the first and last line. Thats why I needed to get the last line. Each line will represent a page of data. When the user inputs page 1, I want the first line to be parsed and displayed on the screen. The good thing is that only one line needs to be shown on screen at a time. However I want the retrieval to be fast because of the size potential. So anticipating user action, if the want line 3, I would go ahead and have lines 2 and 4 already processed. Thats about it. Anymore suggestions would be great. If it is ok, would you be willing to critique my final code? Thanks.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!