This topic is a continuation of a previous topic on
seek() vs. seekBytes()
Originally posted by Prashant Sehgal:
(but this one seems very old...it's using deprecated API).
Did you try replacing the depreciated methods with the new API? There's only two calls that are depreciated, both
String constructors. And the current version of String has constructors that take a byte array, an offset and a length as arguments.
Also when does it make sense to buffer I/O off the disk?
This is something that will differ from application to application. That is why, in the previous topic, I pointed you to resources to help you understand and quantify what is going on in your application. Without doing some homework and making some measurements, you'll be stumbling around in the dark.
Note that the buffer size in Braf is variable. You may get a substantial performance
boost just from setting the buffer size to the approximate size of a record (for you, ~1k). By default, RAF reads a line character by character. Reading the data in a single block would be much faster, and we know you will read 10 million lines, so thats a few billion disk accesses you could save right there.
If you had some grouping of records, where several consecutive indexes would give you hits within several k in the data file, it would make sense to find some balance between loading x records in the buffer and the time+memory it takes to load the buffer. If you don't have some grouping, then making a buffer size greater than 1k would be wasting memory and time. These are factors which need to be tested and tuned. There is no one answer.
Again, the order in which you will gain performance is:
1. Hardware. Without fast hardware, software is slow. Period.
2. Hardware. The cost/performance gain is more justifiable than paying you to tweak code.
3. Hardware. 10 million anything takes time. Double your disk throughput and you will likely double your program's execution speed. Try getting that improvement through tweaking code.
4. Software.