• Post Reply Bookmark Topic Watch Topic
  • New Topic

Seek backwards in BufferedInputStream  RSS feed

 
Bharath Chinnadurai Maharajan
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,
I use BufferedInputStream for byte processing a file. I use skip method on BufferedInputStream to skip bytes that i dont need to process. Now when i encounter an error i need to re-process the bytes that i have already processed. So how can i seek or skip backwards the BufferedInputStream . The skip API is not reacting to negative arguments(FileInputStream does). Or is there any other stream that i can use to traverse both the direction of a file efficiently.

Regards,
Bharath M.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try PushbackInputStream.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
See if mark() and reset() do what you need. If it's truly a file, also see RandomAccessFile and seek().
[ December 18, 2007: Message edited by: Stan James ]
 
Bharath Chinnadurai Maharajan
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Stan,
my file is not markable and also i need to go backwards to a given index. Randomaccess file would ceartinly do the job, but my file is big. Using BufferedIsputStream i was able to process my entire file which is abt 200MB in less than a second, where as Randomaccess file is taking abt 5+ seconds. More over my file can get as big as 2+ GB, and i need my file processed in less than 5 sec. Can't i get the seeking behavior of Randomaccess file and the performance of BufferedInput stream together some how?

Or am i doing it all wrong?

Please suggest.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Bharath]: Using BufferedIsputStream i was able to process my entire file which is abt 200MB in less than a second, where as Randomaccess file is taking abt 5+ seconds.

That's surprising. Long ago (JDK 1.2) I got similar results from RandomAccessFile, finding it much slower than using a BufferedInputStream around a FileInputStream. The early RAF implementation was extremely slow for many things. But by JDK 1.4 when I tried again, I found it was much faster than it had been. I would be surprised if it's gotten slower again. What version of the JDK are you using? Can you show us the code you used here? Perhaps there was something wrong with the code, making it less efficient than it could be. It may also be worthwhile to use a profiler to discover which specific method calls are taking so much time.

Back when I had problems with JDK 1.2's RandomAccessFile, I was able to achieve fairly fast random access to the file by doing the following:

1. close the existing streams.
2. open a new FileInputStream
3. use skip() to get to the index you want
4. open a new BufferedInputStream around the FileInputStream
5. start reading

I was surpised to find out that this complex-sounding procedure could be much faster than RandomAccessFile, but it was - at that time. It's possible that this will be the case for you too.

Note that skip() is not guaranteed to skip all the bytes you requested it to skip. Check the return value and call the method in a loop to ensure that it skips everything you request.

You could probably also update this technique to use a FileChannel instead. It's possible to code this so that you can keep the FileChannel open, while allowing other stream that read from the channel to close. This could be more efficient than closing and opening new FileChannels. It would probably require that you create a special ReadableByteChannel class which overrides the standard close() behavior to prevent the FileChannel from being closed. This is a violation of the standard close() contract, but it may be useful nonetheless. You would use it something like this:

The NonclosingReadableByteChannel would be the new class that exists to protect the FileChannel from being closed.

Also, you haven't said anything about the PushbackInputStream suggestion. The main reason I can think of why you might not want to do this is if you need to be able to go backwards a really long way - you may not have enough memory to do that. But let us know if that's the case or not. This could be considerably simpler than the other alternatives discussed above.
[ December 19, 2007: Message edited by: Jim Yingst ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
my file is not markable ...

I'm curious, why is that?
 
Bharath Chinnadurai Maharajan
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jim,
Thanks for that detailed reply. I use jdk1.4 and below is an ex code snippet where file is RAF. For my test file of abt 200 MB there are 844800 read and seek calls happening. Can i expect a better preformance?



And as for pushbackstream, from the docs i dont see how it can fit my need, can you please through some more light on this? The Filechannel hack, i shall give a try.

And my file is not markable, why? i don't know, may be they have some constraints for a file to be markable.
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is an old article on adding buffering to RandomAccessFile. I don't know how well it's held up.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Bharath]: Can i expect a better preformance?

Probably. It looks like the biggest problem is whenever you do s read that does not return 0x47 - you just keep doing single reads, one byte at a time. This is usually slow; it's better to read many bytes at once. That's what your BufferedInputStream is doing for you, behind the scenes. Would it make sense to read, say, a block of 187 bytes instead? Or even 1870? Or more? I bet that would be much faster. You will probably want to use the readFully() method.

[Bharath]: And as for pushbackstream, from the docs i dont see how it can fit my need, can you please through some more light on this?

Well, I don't really know how or why you need to "re-process the bytes that [you] have already processed". When you need to go backwards, how far back to you need to go? Is there a limit? And how often do you need to do this, anyway?

PushbackInputStream allows you to put bytes back into the stream after reading them (using unread()), essentially restoring the stream to an earlier state before you read the bytes. But you're limited in that you can only put the bytes back if you still have that data in memory, typically in an array. If that fits your problem, great; if not, never mind.

Ultimately, using mark() and reset() is probably better than PushbackInputStream. But it will only work if you never need to move back more than some finite number of bytes (which will be kept in memory). If you are at the end of a 3 GB file and need to go back to the beginning, that's too much to possibly fit in memory, so forget it.

[Bharath]: And my file is not markable, why? i don't know, may be they have some constraints for a file to be markable.

Markability is not a property of a file; it's a property of a stream. If you look at the API for various InputStream types, you will see that a FileInputStream is not ever markable (isMarkable() returns false), but a BufferedInputStream is, always. I suspect you tried marking with the FileInputStream; try marking the BufferedInputStream instead.
[ December 20, 2007: Message edited by: Jim Yingst ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Joe]: This is an old article on adding buffering to RandomAccessFile. I don't know how well it's held up.

Well, the talk about IO performance problems is badly out of date, considering they're talking about JDK 1.0.x. The code to add buffering could probably improve performance for single-byte reads, at least. But if you're using read(byte[]) or readFully(byte[]), I doubt it will make much difference.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!