Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

character encoding issue

 
Huan Niu
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In the assignment description, there is some kind of information:

... All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

Please notice it is "8 bit US ASCII".

So when I use string.getBytes( charsetName ), I want to put a proper charset to get bytes for the string. I looked up in the Java API 1.5 class Charset, and it says:


US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
...
UTF-8 Eight-bit UCS Transformation Format


I am wondering which one I should use? or any other suggestions?

Thanks a lot.
 
jesal dosa
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
use it like this

String fieldValue = new String(field_name, "US-ASCII");
 
mohamed sulibi
Ranch Hand
Posts: 169
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi all;

i want ask also is it true to use the following:

byte[] fieldByte = ....;
String field = new String(fieldByte);

???

best regards
m_darim
 
Huan Niu
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for reply.

use it like this

String fieldValue = new String(field_name, "US-ASCII");


But the "US-ASCII" is 7-bit not 8-bit.

Does this fulfil the requirement?
[ October 16, 2007: Message edited by: Huan Niu ]
 
jesal dosa
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yep "US-ASCII" is 7-bit if you do a search on the in forum you will find that "US-ASCII" is the correct on to use, i can not remember what i searched on but a couple weeks, but someone confirmed it by asking Sun. In a reply

I hope this helps
 
Huan Niu
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, jesal

That is what I'm exactly expecting.

Thank you very much.
 
Edwin Dalorzo
Ranch Hand
Posts: 961
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My specification says I should use 8-bit character encoding. If I used a 7-bit character encoding some characters would not be representable.

Some 8-bit character encodings that you can use are all the ISO-859 family, like:

ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-7
ISO-8859-9
ISO-8859-13
ISO-8859-15

Also the windows-1252 also known as Cp1252.

I would never recommend to use US-ASCII, since it is 7-bit character encoding, and that is not what the specification requires.

See Java Supported Encondings
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic