• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

URLyBird data file format

 
Jon Poulton
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello there,
I have a real problem. In the definition of the format of the data file in the URLyBird notes it says this:

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

The poor English in first sentance is not a typo on my part - its an exact quote. I assume what they mean is:

All numeric values that are stored in the header information use the formats of the DataInputStream and DataOutputStream classes.

Or even:

All numeric values stored in the header information use the of formats the DataInputStream and DataOutputStream classes.

Either sentance would make sense. Although its not a big deal, it makes me doubt the accuracy of what they say elsewhere in the paragraph (indeed, the whole document). Especially where it says:

The character encoding is 8 bit US ASCII.

I was going to use the constructor for String which takes a byte array and a charset name to convert an array of bytes into the correct character string. However, the documentation for the constructor referred me to the Charset class for a list of allowed charsets. The only US ASCII one was:

US-ASCII - Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set.

Thats seven bit US-ASCII, not EIGHT bit US-ASCII. I assume if I use this charset to decode the bytes I'm going to end up with the wrong characters. Am I missing something here?
 
Frank Hardy
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jon,

gosh!! This puzzled me too! Additionally I'm not a native english speaker.
I used the encoding format "US-ASCII" and the results were fine. Just test it.


Regards,

Franky.
 
Paul Bourdeaux
Ranch Hand
Posts: 783
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jon Poulton:
The character encoding is 8 bit US ASCII.
Hi Jon,
This has come up several times in the past. Some think it is a typo, others think Sun put it in to see if we would catch it (personally I think it is a typo). A little while ago (six months?) someone from this forum emailed Sun to inquire about it, and Sun replied saying that we could use ISO-889-1. That is what I used in my assignment and I recieved max points in the Data Store area.

Do a search on "character encoding" or "8 bit US ASCII" and you will see several threads discussing this problem. Whatever you decide, remember to document this issue in your choices.txt!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic