• Post Reply Bookmark Topic Watch Topic
  • New Topic

Fixed Length Record

 
Mohan Panigrahi
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a huge text file, which I have to parse and write relevant records to the binary file. I need to keep the length of each record fixed, so that they can be retrieved easliy.
At present the approach I am following is :
(1) Create a class with long,floats and StringBuffer( set to a fixed length) fields.
(2) I instantiate the above class, fill data and serialize it.
The problem I am facing is that the space occupied by each record is much more than what I can account for by counting the space required by each of the fields ( for ex: a long will need 4 bytes, float 4 bytes. A char 1 or 2 bytes. So an object should occupy almost as much space as the sum of spaces needed by each of its fields)
Question 1 : Has any one faced a similar problem and solved it successfully.
Question 2 : I am thinking of writing individual fields directly to binary file, without incapsulating into any object. My problem is that since some of the data is in form of Strings, different records are occupying different spaces in memory. Is there any way I can make the String data as of constant length ( char[] does not work, StringBuffer.setLength(someConstant).toString().toBytes() does not work)
Question 3 : Also I faced one strange behaviour. When I instantiate an object and serialize it and simply change the value of fields of that object and serialize it , the serialization does not happen properly. But if I create a new object and populate its fields with the values for the second record, it serializes properly. Can any one explain this ?
Thanks
Mohan
 
Stu Glassman
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To know why the records take up different lengths in a binary file, you need to understand how String objects are stored. String objects are actually stored in a compressed format, known as UTF. Characters that only take up 7 bits are stored using 8 bits, instead of the usual 16 bits used for most characters. Also, when Strings are stored, extra space is used to store length information. I would expect that the StringBuffer class has even more fields than the String class, necessiating more space to be used when storing a StringBuffer.
I'm not sure about question 3.
Hope this helps,
-Stu
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Q1: The length of a long is 8 bytes, not 4. Perhaps you're just miscounting?
Q2: There's no direct API I know of to make your String record fixed-length. Essentially you'll have to do it yourself by figuring out how long your record is, and adding extra 0 bytes (or whatever value you want) until the length has the required size. If the record is too big to begin with, throw an error or something. There are numerous ways you can encode a String - I'd probably use the one described for the writeUTF() and readUTF() methods of DataOutput and DataInput. (In general, DataOutputStream and DataInputStream have a lot of methods that will be useful to you for something like this.)
Q3: No idea. What exactly happens when the serialization "does not happen properly"?
 
Mohan Panigrahi
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Stu and Jim for your replies. They have helped me to complete my project work as best.
Regarding question 3 :
Here is the clarification :
(a) I create object instance of class A, say a.
(b) I populate fields of a.
(c) I serialize a
(d) Now I again populate fields of a with different data. ( Note : I did not do a = new A())
(e) Now I serialize it.
If I do this, then I cannot read the object from the serialized file. But if I modify step(d) and do a = new A() then I am able to read record from the serialized file.
So could any one explain this?
Thanks
Mohan
 
Mohan Panigrahi
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
Just one more question :
Say in step (d) I do A a = new A() and then serialize it. I find that the size of the object serized for the first time ( ie step (c)) is more than the size of any of the objects serialized later ( by repeating steps(d)& (e) over and over ).
My assumption is that : The fist object serialized takes more space because it contains information about the field names, while the later objects serialized do not contain that information.
Can any one confirm this.
Thanks
Mohan
[ July 11, 2002: Message edited by: Mohan Panigrahi ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!