Gurus,
I've read most of the posts here relating to reading and writing bytes to/from the data file. This is what I've come up with and I want to make sure that I'm not doing anything blatantly idiotic. First, I'll post the data file format and then my assumptions.
**** Data File Format Start ****
Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record
Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block
Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information
End of file
All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.
**** Data File Format End ****
- for the numeric values, I should be using RandomAccessFile#readInt and #readShort
- the valid record flag should equal a
string of "\u0000\u0000" and the delete field flag should equal a string of "\u8000"
- I should be using RandomAccessFile#readFully instead of #read when loading my byte[] objects
- When I convert the bytes I read into a String, I should do a new String(bytes,"US-ASCII") and a strObj.getBytes("US-ASCII") on writes
- "US-ASCII" is really 7 bit and I need 8 bit. Am I missing something here or do I need another encoding?
- I'm not sure of the best way to handle my delete flag writes, RandomAccessFile#writeChars("\u8000")???
- Even though it's been highly debated, I think I'll keep from trimming the spaces following many of the values in the data file, when I read them into memory.
- When reading in the field values, I'll have to loop through the chars and find the first null, everything before that will be my field value.
Thanks a lot gang.
-Tim