This week's book giveaway is in the Cloud forum.
We're giving away four copies of The Business Blockchain and have William Mougayar on-line!
See this thread for details.
Win a copy of The Business Blockchain this week in the Cloud forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

B&S: Data File Format

 
Alain Dickson
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All, need help to understand the data file and its format. I have never worked with files in real life and this is the only part which is not fitting into my understanding.

FORMAT DESCRIPTION IN INSTRUCTIONS
(I have inserted dashed lines with section names for ease of asking questions):
--------------------SECTION 1--------------------------------
Start of file
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record
-------------------SECTION 1 ENDS--------------------------------

-----------------SECTION 2--------------------------------------
Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block
----------------SECTION 2 ENDS------------------------------------

----------------SECTION 3-----------------------------------------
Data section. (offset into file equal to "offset to start of record zero" value)
Repeat to end of file:
2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information
----------------SECTION 3 ENDS----------------------------------------

---------------
ACTUAL DATA FILE SUPPLIED WITH MY ASSIGNMENT(HEADER AND SOME OF DATA)AS VIEWED IN WORDPAD.(BUT IT SHOWS DIFFERENT IN HEX EDITOR AND JEDIT). I hope my questions can be answered with this view.

Fname location@
specialties@sizerateownerDogs With Tools Smallville Roofing 7 $35.00 Hamner & Tong Smallville Drywall, Roofing 10 $85.00

--------------------------

QUESTIONS:
1. Can you please seprate different sections of actual data file according to the file format(Eg: which part is "magic cookie", "offset" etc.)
2. Where is the 2 byte flag which indicates valid/deleted record.

Many thanks,
Alain
 
Jeffry Kristianto Yanuar
Ranch Hand
Posts: 759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi friend, welcome to the JavaRanch such a lovely place for Java programmer.

2 byte flag. 00 implies valid record, 0x8000 implies deleted record
Record containing fields in order specified in schema section, no separators between fields, each field fixed length at maximum specified in schema information


This is the 2 byte flag which indicates valid/deleted record.

First you have to use RandomAccessFile that point to the database file.


4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record


Let start with the above section.

To read the 4 byte numeric you use readInt() method in RandomAccessFile. Why readInt() ? Because 4 byte is integer (32 bits). the method return an int that read from those 4 byte.

the sample code is :

//create the RandomAccessFile object first
int magicCookie = randomAccessFile.readInt();
int offset = randomAccessFile.readInt();
short numberic = randomAccessFile.readShort(); //short is 2 byte (16 bits)


And the rest is similar

Hope that's help

Jeffry Kristianto Yanuar (Java Instructor)
SCJP 5.0, SCJA, SCJD (UrlyBird 1.3.2) --> Waiting for the result
 
Alain Dickson
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the insight Jeffry,
I printed the data file as a string to console using RandomAccessFile.

What I found is:
1. First four bytes showed something which looked like magic cookie i.e. two faces - I guess thats fine, I can save those bytes and compare them whenever a datafile is accessed.

2. But I could not find anything correspnding to delete falg, IS it at the end of every record. I calculated the bytes for all the fields and there are two extra bytes at the end of every record, but they don't display anything. -- Can I use those two bytes to write this flag -- are those two bytes ment for that -- or -- will i be altering the format of datafile by writing over those two bytes(which I am not supposed to).

Please give your feedback.

Thanks,
Alain
 
Jeffry Kristianto Yanuar
Ranch Hand
Posts: 759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
2. But I could not find anything correspnding to delete falg, IS it at the end of every record. I calculated the bytes for all the fields and there are two extra bytes at the end of every record, but they don't display anything. -- Can I use those two bytes to write this flag -- are those two bytes ment for that -- or -- will i be altering the format of datafile by writing over those two bytes(which I am not supposed to).


the flag is in the beginning for each record, not in the end of each record.


Please try again

Jeffry Kristianto Yanuar (Java Instructor)
SCJP 5.0, SCJA, SCJD (UrlyBird 1.3.2) --> Waiting for the result
 
Alain Dickson
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks once again Jeffry, I seem to understand the data file now with one doubt left behind.

you are absoulutly right, the delet flag is at the begining of each record.
Actually I was checking records from the middle of the file, so begining of one record was end of other record(I found two empty bytes between two records)

The doubt:
The valid record flag 00 means empty bytes(nothing written on it) OR I should write 00 on it, and ofcourse I will be writing 0x8000 if I have to mark a record as deleted.

Thanks,
Alain
 
Jeffry Kristianto Yanuar
Ranch Hand
Posts: 759
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The doubt:
The valid record flag 00 means empty bytes(nothing written on it) OR I should write 00 on it, and ofcourse I will be writing 0x8000 if I have to mark a record as deleted.


Yes if you create a new record, you write 00 flag and all the record's fields. When you delete, you write 0x8000 flag. So deleting record doesn't delete all the byte, just change the flag only.

Finding out how to read the database file is the fist think I did in my assignment. So if you already know how to read it, I'm sure you'll know how to write it. Using RandomAccessFile make it easy to point at a certain byte back and forward.

Good Luck and wish me luck too !!!


Jeffry Kristianto Yanuar (Java Instructor)
SCJP 5.0, SCJA, SCJD (UrlyBird 1.3.2) --> Waiting for the result
[ December 11, 2008: Message edited by: Jeffry Kristianto Yanuar ]
 
Alain Dickson
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks a lot Jeffry for your help.
I will start coding in couple of days, and I guess, I will need lot of help from Ranches.

Wish you all the best for your result
I am sure you will make it.
Let us know once you get your result.

Thanks,
Alain
 
Rajesh Moorthy
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The valid record flag 00 means empty bytes (nothing written on it) OR I should write 00 on it, and ofcourse I will be writing 0x8000 if I have to mark a record as deleted.


1) While reading the unedited database file, we should treat a record as valid if the flag contains empty bytes.
2) While writing a valid record, we should prefix 00 to the record.
3) This means, while reading an edited database file, we should treat both empty bytes and 00 as flags for valid records.
4) In this case, why should we prefix 00 for a valid record. In all cases, why shouldn't we treat the empty bytes for valid records?

In other words, why should we use the concept of 00 itself, when the original database uses empy bytes as the valid record flag?

Thanks,
Rajesh.
 
K. Tsang
Bartender
Posts: 3583
16
Android Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When reading and writing files, it really depends on what IO class you are using. Suppose you use XXInputStream and XXOutputStream then the read/write is separated using these classes. If you say use RandomAccessFile which contains both read and write then in my opinion will make life easier.

When you first read the file, the header is read. Then subsequent runs, you should able to just jump to that particular record and read delete flag and the data. Same for writing.
 
Alain Dickson
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rajesh - try having look at the data file in a hex editor, The empty bytes are shows as 00 00.

When you write to file using RandomAccessFile's writeShort(00)... it produces same results in data file. I just learned it by writing to file in different ways.

It is good to understand why's and how's of things, But some times Just making the things work and move forward is a good idea for this assignment.

I understand that when we are working on this assignment we tend to get into detail of everything, but trust me don't get too emotional about this assignment, Make things work right and let it go. If carefully searched, this fourm quickly tells you how to achive desired results.

I hope this helps..
 
Rajesh Moorthy
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By using RandomAccessFile.readShort(), the valid flag is being displayed as "0". However, it is not possible to use this method for the delete flag because:
1) range of short is between -32768 to 32767
2) delete flag = 0x8000 = 32768. This is outside the range of short.

What are the other ways to handle this scenario?

Thanks,
Rajesh.
 
Rajesh Moorthy
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey dudes, could anyone please provide inputs for the above question?

Thanks,
Rajesh.
 
Roberto Perillo
Bartender
Posts: 2271
3
Eclipse IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey, Rajesh!

What if you tried something like this (this is how I did it):



Instead of using readShort(), use read(), specifying the flag's size in bytes.
 
Alain Dickson
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Rajesh, Sorry for delayed response man! I was too busy.

1) range of short is between -32768 to 32767
2) delete flag = 0x8000 = 32768. This is outside the range of short.


Just wirte a small code and try writeShort(0x8000): Since this method accepts "short" argument the complier will not allow anything larger than short without a cast.

I did writeShort(0x8000) for deleteting a record and writeShort(00) for marking a record as valid.

It will work, and you will not have any problems due to this...

Alain Dickson,
SCJP 6, SCJD

 
Rajesh Moorthy
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your responses.

Here's my analysis:

1) We have to read 2 bytes only. This is the requirement. Therefore, we can read only "short" and not "int".

Hence, the following code may not fetch the intented result:
final int eof = database.read(flagBuffer, 0, FLAG_SIZE);

Please correct me if I am wrong.

2) The method writeShort() in RandomAccessFile accepts "int" argument and not "short" argument. This is the reason why writeShort(0x8000) works, eventhough the value is greater than the range of "short".


1) range of short is between -32768 to 32767
2) delete flag = 0x8000 = 32768. This is outside the range of short.


While reading the value, using readShort() will result in -0x8000 (-32768). For getting 0x8000 (32768), we should use readUnsignedShort().

----

Keeping the above points in mind, following 2 options can be chosen:

1)

if (readShort()==0) {
valid record;
}
else {
deleted record;
}

2)

if (readShort()==0) {
valid record;
}
else if (readUnsignedShort == 0x8000) {
deleted record;
}
else {
no idea; // can someone explain ?
}

If we choose Option 1 above, we can read/write "0" for a valid record and any other value for a deleted record. Then, what is the significance for the value 0x8000 ?

Am a little bit confused

Thanks & regards,
Rajesh.
 
Bert Bates
author
Sheriff
Posts: 8905
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
let's not get too detailed you guys...
 
Alain Dickson
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Bert, let me take one more chance on this....

Rajesh:

The delete flag 0x8000 == 32768 which is greater than two bytes
BUT we have to write this in two bytes(as data File schema only gives us two bytes)
THEREFORE 0x8000(32768) can only be represented as -32768 in two bytes. (I don't think there is any other way to represent this number as positive number in two bytes).
THATS what the method writeShort(int x) of RandomAccessFile does. It does some bit operations and write writeShort(0x8000) as -32768.
WHEN you read using readUnsignedShort() it reads it as 32768==0x8000

Rajesh Said:
if (readShort()==0) {
valid record;
}
else if (readUnsignedShort == 0x8000) {
deleted record;
}
else {
no idea; // can someone explain ?
}


Answer to "else" Part: Database is corrupt, the schema does not allow anything else. So when you do any operation on database check if it is valid(don't read deleted or corrupted records) EXCEPT when you are adding a new record you might want to use a space of deleted record, where you have to look for deleted record (0x8000)
 
Rajesh Moorthy
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Alain,

That is a very good explanation. Thank you very much.

Regards,
Rajesh.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic