Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Understanding the DB file

 
Vrinda Werdel
Ranch Hand
Posts: 75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Guys,
I am having a hard time understading the schema of the db file. Can you help? It is proving to be my stumbling block.
I would appreciate if somebody could help.
regards
Vrinda.
 
George Marinkovich
Ranch Hand
Posts: 619
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Vrinda,
Originally posted by Vrinda Werdel:
I am having a hard time understading the schema of the db file. Can you help? It is proving to be my stumbling block.

Well, you've asked a pretty general question, so without knowing specifically what you don't understand, I'll have to give a pretty general response.
The db file is a little unusual because it contains a schema data section as well as the expected header section and data section. Your assignment instructions give you enough information to read the header section. The header contains valuable information that you need in order to read the schema data section, namely the number of fields in each data record. In turn the number of fields tells you how many schema data items you'll be reading in the schema data section.
Each schema data item is comprised of three fields: name length, name, and field length. You can store the schema data in three arrays, one for each schema data field. The dimension of the arrays will, of course, equal the number of fields in a record. The first field is of fixed length so you always read the same number of bytes. The second field is of variable length, but the length is known since it's given by the first field (name length). The third field is fixed length so you always read the same number of bytes. So, you read these three schema data fields once for each field in the data record (equal to the number of fields value read in from the header).
The next thing of interest is where the data records start. You know the size of the header since it's fixed and can be calculated from the assigment instructions. The size of the schema section is variable, so you will have to use a formula to calculate it. The size of the header plus the size of the schema data section is the location of the start of the data records. The size of a data record can also be determined from the schema data (remember to add one for the record validity flag).
To read a particular data record you seek to a location given by a formula that takes into account the location of the start of the data records, the record number, and the record size. You read the validity flag to determine whether the record is valid, and if it is you read the record field by field according to the schema data.
That's basically all I can say without knowing what the specific problem is.
Hope this helps,
George
 
Vrinda Werdel
Ranch Hand
Posts: 75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks George for the explanation. May be I could visualise it better with an example.
Suppose I have a data item with Location="Austin", then it would be represented as
name length = 8
name = Location
Field Length = 6
Is this how this is represented? Am I getting it right? Then how can I know that the value is actually "Austin"?
Also, when you seek to a particular position and at that position you have an integer val. (say 20) how do you read it? (raf.readInt() is not working?).
In the db file description of my file, I am given some info. like " byte numeric, offset to start of record zero ".
My guess is that I can get hold of this value and directly seek() to that position. Also how do I know I have read a record completely and reached the start of next record?

I have something like this in my assignment.
Schema description section.
Repeated for each field in a record:
2 byte numeric, length in bytes of field name
n bytes (defined by previous entry), field name
2 byte numeric, field length in bytes
end of repeating block
I would really appreciate if you could elaborate on this. I am badly stuck and unable to make much headway due to this.
Vrinda.
 
Ken Krebs
Ranch Hand
Posts: 451
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Vrinda,
Also, when you seek to a particular position and at that position you have an integer val. (say 20) how do you read it? (raf.readInt() is not working?).

Not working ??? You probably need to post a code snippet.
In the db file description of my file, I am given some info. like " byte numeric, offset to start of record zero ".
My guess is that I can get hold of this value and directly seek() to that position. Also how do I know I have read a record completely and reached the start of next record?

The RandomAccessFile's file pointer will update as you read or write to the file. It's probably best to always read and validate the header and schema definition information before reading or writing any data records to make sure the file hasn't been corrupted. Once you've done that, you should be at the start of the data section. After you read/write a record's data correctly, the current file position will be pointing at the next record, unless of course you have read the last record in which case you should be at the end of the file.
It's probably also a good idea to have at one time manually verified that the data file you've been given is consistent with the description of the data file format you've been given in your instructions. This is probably best done with a hex editor or viewer program. This will also allow you to ascertain inconsistencies and enable you to resolve ambiguities you may find in the specification.
[ February 18, 2004: Message edited by: Ken Krebs ]
 
George Marinkovich
Ranch Hand
Posts: 619
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Vrinda,
Ken's given you some good ideas that I think might help.
From your question I think I see one of your misunderstandings. The database file contains two kinds of data. It contains the data I think you're expecting to see, but it also contains data about the data, or metadata. This metadata is what the data schema section is all about. When you read the data file the first thing you do is process the header information. I assume you have no problem reading the header information. The next thing you need to do is process the data schema data (the metadata that describes how the record data that appears later in the data file is structured). You must process header data and the schema data (a good time to do this is when you open the data file). Processing the header and data schema will provide you all sorts of information that you need to process the rest of the data file (the part of the data file that contains the record data that you're probably expecting). Processing the header and data schema will give you:
1) the number of fields in a database record
2) the name of each record field in a database record
3) the number of bytes in each field name
4) the number of bytes contained in each database record field
From this you can calculate everything else you will need to process the rest of the database file (the real data).
Maybe a picture of a simplified database file (with only 3 records and with only 2 fields per record) will help:
dfe = data file element
<dfe name (actual value of the dfe) - number of bytes needed for the dfe>
----------------------------------------------------------------
<Magic cookie value (512) - 4 bytes>
<Number of fields in each record (2) - 2 bytes>
<Length in bytes of first schema field name (4) - 1 byte>
<first schema field name (name) - 4 bytes>
<first schema field length in bytes (32) - 1 byte>
<Length in bytes of second schema field name (8) - 1 byte>
<second schema field name (location) - 8 bytes>
<second schema field length in bytes (64) - 1 byte>
<first record deleted record flag (0) - 1 byte>
<first record, first field value ("Dogs With Tools") - 32 bytes>
<first record, second field value ("Hilldale") - 64 bytes>
<second record deleted record flag (0) - 1 byte>
<second record, first field value ("Toll Brothers") - 32 bytes>
<second record, second field value ("Boston") - 64 bytes>
<thrid record deleted record flag (0) - 1 byte>
<third record, first field value ("Boone and Daniels") - 32 bytes>
<third record, second field value ("Middleton") - 64 bytes>
----------------------------------------------------------------------

So the sample data file contains 3 records as follows:

The sample file also contains the following schema data:
 
Javini Javono
Ranch Hand
Posts: 286
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

It's probably also a good idea to have at one time manually verified that the data file you've
been given is consistent with the description of the data file format you've been given in
your instructions. This is probably best done with a hex editor or viewer program. This will
also allow you to ascertain inconsistencies and enable you to resolve ambiguities you may
find in the specification.

I certainly found it valuable to use a free hex viewer when I first stared out on this project.
(I found a free hex viewer for Windows, but never for Mac OS X)
I certainly felt more comfortable visually verifying that there really was some bit of data
where the schema said it was. And, simultaneously reading the Sun instructions while
looking at the hex version of the file probably made it easier to understand Sun's given
instructions.
Thanks,
Javini Javono
 
Vrinda Werdel
Ranch Hand
Posts: 75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
George, thanks a ton for the explanation. I am able to see things lot more clearly now. The example you quoted definitely helped.
Thanks Ken and Javini as well. BTW, Ken here's the code I am trying to use and get it working.
I have a header section that reads like this.
4 byte numeric, magic cookie value identifies this as a data file
4 byte numeric, offset to start of record zero
2 byte numeric, number of fields in each record
What I doing is I am 'seek'ing position 9 and try to read the value for the number of fields in each record. Incidentally, when I run this, I get an unusually big number. guess may be it is a hex value. Or am I missing some thing? Or is the way I am trying to read is wrong?
public class FileReaderClass {
public void read() {
RandomAccessFile raf = new RandomAccessFile("xyz.db","r");
raf.read(9);
System.out.println("The number of fields is "+raf.readInt());
}
}

Again appreciate all the assistance.
regards
Vrinda
 
Ken Krebs
Ranch Hand
Posts: 451
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Vrinda,
Please place your code in a "CODE" Instant UBB Code ( see near the foot of the page. It will hold the space formatting and be easier to read.

This code will not compile.
raf.read(9); // There is no such method. See the Javadoc
The other 2 lines in read(0 can both throw IOExceptions. These must
  • raf.read(9); // There is no such method. See the Javadoc. Look at each read method and find the coreect method for the chunk of data you wish to read.
  • The other 2 lines in read(0 can both throw IOExceptions. These must be either declared in the method's throws clause or be caught and handled locally.


  • Here is some modified code along with a main method to run the read operation.

    This code will either print the "magic cookie" value or report a some type of IOException.
    Hope that helps.
     
    Vrinda Werdel
    Ranch Hand
    Posts: 75
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hi Ken,
    It was raf.seek(9). My apologies for the mistake. The changed code is as below.
    regards
    Vrinda

    [Andrew: broke up single line of code so that horizontal scrolling is not required]
    [ February 19, 2004: Message edited by: Andrew Monkhouse ]
     
    Ken Krebs
    Ranch Hand
    Posts: 451
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Vrinda,
    File pointers start at 0 not 1, so you would want to seek to 8 instead of 9 and then you would want to readShort() instead of readInt() because it's 2 byte, not 4. Please do yourself a favor and read the Javadocs.
    And you still have to handle the exceptions.
     
    Vrinda Werdel
    Ranch Hand
    Posts: 75
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    thanks Ken.
    Vrinda
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic