Forums Register Login

Fixed Length Record

+Pie Number of slices to send: Send
I have a huge text file, which I have to parse and write relevant records to the binary file. I need to keep the length of each record fixed, so that they can be retrieved easliy.
At present the approach I am following is :
(1) Create a class with long,floats and StringBuffer( set to a fixed length) fields.
(2) I instantiate the above class, fill data and serialize it.
The problem I am facing is that the space occupied by each record is much more than what I can account for by counting the space required by each of the fields ( for ex: a long will need 4 bytes, float 4 bytes. A char 1 or 2 bytes. So an object should occupy almost as much space as the sum of spaces needed by each of its fields)
Question 1 : Has any one faced a similar problem and solved it successfully.
Question 2 : I am thinking of writing individual fields directly to binary file, without incapsulating into any object. My problem is that since some of the data is in form of Strings, different records are occupying different spaces in memory. Is there any way I can make the String data as of constant length ( char[] does not work, StringBuffer.setLength(someConstant).toString().toBytes() does not work)
Question 3 : Also I faced one strange behaviour. When I instantiate an object and serialize it and simply change the value of fields of that object and serialize it , the serialization does not happen properly. But if I create a new object and populate its fields with the values for the second record, it serializes properly. Can any one explain this ?
Thanks
Mohan
+Pie Number of slices to send: Send
To know why the records take up different lengths in a binary file, you need to understand how String objects are stored. String objects are actually stored in a compressed format, known as UTF. Characters that only take up 7 bits are stored using 8 bits, instead of the usual 16 bits used for most characters. Also, when Strings are stored, extra space is used to store length information. I would expect that the StringBuffer class has even more fields than the String class, necessiating more space to be used when storing a StringBuffer.
I'm not sure about question 3.
Hope this helps,
-Stu
+Pie Number of slices to send: Send
Q1: The length of a long is 8 bytes, not 4. Perhaps you're just miscounting?
Q2: There's no direct API I know of to make your String record fixed-length. Essentially you'll have to do it yourself by figuring out how long your record is, and adding extra 0 bytes (or whatever value you want) until the length has the required size. If the record is too big to begin with, throw an error or something. There are numerous ways you can encode a String - I'd probably use the one described for the writeUTF() and readUTF() methods of DataOutput and DataInput. (In general, DataOutputStream and DataInputStream have a lot of methods that will be useful to you for something like this.)
Q3: No idea. What exactly happens when the serialization "does not happen properly"?
+Pie Number of slices to send: Send
Thanks Stu and Jim for your replies. They have helped me to complete my project work as best.
Regarding question 3 :
Here is the clarification :
(a) I create object instance of class A, say a.
(b) I populate fields of a.
(c) I serialize a
(d) Now I again populate fields of a with different data. ( Note : I did not do a = new A())
(e) Now I serialize it.
If I do this, then I cannot read the object from the serialized file. But if I modify step(d) and do a = new A() then I am able to read record from the serialized file.
So could any one explain this?
Thanks
Mohan
+Pie Number of slices to send: Send
Hi,
Just one more question :
Say in step (d) I do A a = new A() and then serialize it. I find that the size of the object serized for the first time ( ie step (c)) is more than the size of any of the objects serialized later ( by repeating steps(d)& (e) over and over ).
My assumption is that : The fist object serialized takes more space because it contains information about the field names, while the later objects serialized do not contain that information.
Can any one confirm this.
Thanks
Mohan
[ July 11, 2002: Message edited by: Mohan Panigrahi ]
Yeah. What he said. Totally. Wait. What? Sorry, I was looking at this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 2306 times.
Similar Threads
Validating the Data File
B&S: problem in the db file processing
.db file format problem, help please!!!
URLyBird Data File Format
How to decide header length from data file ?
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 28, 2024 19:34:16.