• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Buffered RAF

 
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Does anyone here know of any *working* and/or *tested* implementations of a Buffered RandomAccessFile? I know of one from here (but this one seems very old...it's using deprecated API).

Also when does it make sense to buffer I/O off the disk? For instance it my code is trying to make > 10 Million seek() calls to a file - each call fetching a line from a different byte offset - won't buffering worsen the performance, for the simple reason that the same disk I/O now takes one additional step (i.e. buffering) before it delivers the output?

Thanks,
Prashant.
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This topic is a continuation of a previous topic on seek() vs. seekBytes()

Originally posted by Prashant Sehgal:
(but this one seems very old...it's using deprecated API).


Did you try replacing the depreciated methods with the new API? There's only two calls that are depreciated, both String constructors. And the current version of String has constructors that take a byte array, an offset and a length as arguments.



Also when does it make sense to buffer I/O off the disk?


This is something that will differ from application to application. That is why, in the previous topic, I pointed you to resources to help you understand and quantify what is going on in your application. Without doing some homework and making some measurements, you'll be stumbling around in the dark.
Note that the buffer size in Braf is variable. You may get a substantial performance boost just from setting the buffer size to the approximate size of a record (for you, ~1k). By default, RAF reads a line character by character. Reading the data in a single block would be much faster, and we know you will read 10 million lines, so thats a few billion disk accesses you could save right there.
If you had some grouping of records, where several consecutive indexes would give you hits within several k in the data file, it would make sense to find some balance between loading x records in the buffer and the time+memory it takes to load the buffer. If you don't have some grouping, then making a buffer size greater than 1k would be wasting memory and time. These are factors which need to be tested and tuned. There is no one answer.
Again, the order in which you will gain performance is:
1. Hardware. Without fast hardware, software is slow. Period.
2. Hardware. The cost/performance gain is more justifiable than paying you to tweak code.
3. Hardware. 10 million anything takes time. Double your disk throughput and you will likely double your program's execution speed. Try getting that improvement through tweaking code.
4. Software.
 
Hug your destiny! And hug this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic