• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

[algorithm][idea]How avoid a lot of file seeks?

 
Ranch Hand
Posts: 101
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
We have large files, which not fit in memory. There exists b+-tree indexes. But finding record require big file seek. One seek is not equivalent 4 kiB block - seek has 10 millisecond of time. whereas speed of reading is 50 MB/s therefore one seek takes time as reading 500 kB. Event more - better for mechanic disc is reading continuous data than seeking.
For example: Bitcoin blocks has 100 GB of data, 200 millions transactions, 500 millions inputs + 500 millions outputs =  one billion scripts.
assume, we not use database system, but working on file system and algorithms: 200 mln transactions are indexes, now we must fin d addresses of inputs: each input has no address but has link to transaction. We must do 500 mln seeks = take time as reading 250 TB data (!) instead "only" 100 GB od data.
Which idea to decrease seek count? Maybe delayed using of found transaction, to accumulate 1000 transactions near one other?
How to make profit from memory cache? In memory can be 1%, 10%,50% of big file. Naive seek not distinguish if is more than one record in memory.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic