• Post Reply Bookmark Topic Watch Topic
  • New Topic

URLyBird data file format  RSS feed

 
Jon Poulton
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello there,
I have a real problem. In the definition of the format of the data file in the URLyBird notes it says this:

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

The poor English in first sentance is not a typo on my part - its an exact quote. I assume what they mean is:

All numeric values that are stored in the header information use the formats of the DataInputStream and DataOutputStream classes.

Or even:

All numeric values stored in the header information use the of formats the DataInputStream and DataOutputStream classes.

Either sentance would make sense. Although its not a big deal, it makes me doubt the accuracy of what they say elsewhere in the paragraph (indeed, the whole document). Especially where it says:

The character encoding is 8 bit US ASCII.

I was going to use the constructor for String which takes a byte array and a charset name to convert an array of bytes into the correct character string. However, the documentation for the constructor referred me to the Charset class for a list of allowed charsets. The only US ASCII one was:

US-ASCII - Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set.

Thats seven bit US-ASCII, not EIGHT bit US-ASCII. I assume if I use this charset to decode the bytes I'm going to end up with the wrong characters. Am I missing something here?
 
Frank Hardy
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jon,

gosh!! This puzzled me too! Additionally I'm not a native english speaker.
I used the encoding format "US-ASCII" and the results were fine. Just test it.


Regards,

Franky.
 
Paul Bourdeaux
Ranch Hand
Posts: 783
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jon Poulton:
The character encoding is 8 bit US ASCII.
Hi Jon,
This has come up several times in the past. Some think it is a typo, others think Sun put it in to see if we would catch it (personally I think it is a typo). A little while ago (six months?) someone from this forum emailed Sun to inquire about it, and Sun replied saying that we could use ISO-889-1. That is what I used in my assignment and I recieved max points in the Data Store area.

Do a search on "character encoding" or "8 bit US ASCII" and you will see several threads discussing this problem. Whatever you decide, remember to document this issue in your choices.txt!
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!