• Post Reply Bookmark Topic Watch Topic
  • New Topic

Deleting from a RandomAccessFile  RSS feed

 
Flo Powers
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I have a program that writes fixed length records to a RandomAccessFile using a hashing algorithm on a key field. This works fine, but I'm unsure how best to implement deletion of records.

People have suggested different ways to deal with this, but I'm unsure which one is best, or if there is an even better method. What are the issues involved?

# Method 1: overwriting records with empty bytes (and then the program needs to check for that before writing - if empty bytes in a position, write there, otherwise, probe or go to overflow area for free record).

# Method 2: Keep a linked list or array (or similar) in which you store the byte positions of records which are to be treated as deleted. If the hash algorithm assigns a new record to a position which has been used before, but which was then 'deleted' (i.e. the byte position was entered in the list of records tagged for deletion), the program will simply just overwrite the previous record. This, in other words, is a kind of 'flagging' of records to be deleted without actually physically removing them. If outputting records, this record will be ignored as it is entered in the 'deleted records' list. One thing to keep in mind is that the contents of the list need to be written to secondary storage along with the RandomAccessFile: if a record is flagged, but not deleted, the program needs to remember that the record is flagged for deletion next time it is run as well.

Any comments on these methods, or suggestions for better ways?
Thank you very much!
[ December 13, 2005: Message edited by: Flo Powers ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can keep a linked list inside the file ... each valid record can have the byte offset of the next valid record, and each empty (deleted or never used)record can have the byte offset of the next empty record. Insert two short fields at the top of the file with the byte offset of the first valid record and the first deleted record. When you need to insert a new record, see if there is a deleted slot to put it in.

Sophisticated RandomAccessFile work will eventually take you to something like IBM's Virtual Storage Access Method. With a lot of mental effort I could drag up what all of these mean (I just googled for the list) ...

I remember ... a binary tree index stored within the file takes you to the right interval (block of records with a range of keys) for your key, then a nested tree indexes the keys within the interval and freespace within the interval. To insert a key in the range you use up some of the free space. When an interval runs out of free space you split it into two intervals using up some of the area (block of intervals) freespace. When an area runs out of freespace you split it into two areas with the new one on the end of the file. CA splits are bad. It's all rushing back! The fun part is you get to say HURBA and HARBA a lot - High Used Relative Byte Address and High Allocated Relative Byte Address.
 
Flo Powers
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Groan... that looks rather hefty. Despite the lure of HURBAs and HARBAs, I think I'll stick to my (conceptually speaking) simpler method, less efficient though it may be. I like the first point, though, with the linked list implementation within the RAF.

[EDIT]: Wait, on second thought, your idea with the linked list looks really interesting. When you say "byte offset", does that refer to the "absolute" byte position (i.e. number of bytes from beginning of file)?

Thank you very much!
[ December 13, 2005: Message edited by: Flo Powers ]
 
Paul Clapham
Sheriff
Posts: 22816
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or there's method #3, where each record contains a one-byte "Deleted" flag which is 1 for "deleted" and 0 for "not deleted". This was the technique used by dBase back in the 1970's when databases were just being invented... speaking of which, you could use a database. It would take care of that particular problem and the other problems you haven't come across yet.
 
Flo Powers
Ranch Hand
Posts: 57
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah, of course, that's a very simple way of doing it. Duh! Thanks )
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good tip, Paul. I rambled on much more than I should have before. I guess my what I forgot to say was that a really robust RAF solution will be really complicated. I sincerely hope a simpler solution works well for you!
[ December 15, 2005: Message edited by: Stan James ]
 
Junilu Lacar
Sheriff
Posts: 11476
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah, the deleted flag in dBase... you're making me wax nostalgic...
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!