You get different values from the
getBytes() method depending on the specific character encoding you are using. Because that's what
getBytes does, that's to say, assign one or more bytes to represent a character depending of the encoding.
ASCII values fit in one byte, but other kind of character encodings contain more than 256 characters. Like the unicode set, for instance. Hence,
Java has to manipulate the corresponding characters using more than one byte.
You can determine the current character encoding settings by means of the System class:
String currentEncoding = System.getProperty("file.encoding"); Or by means of the Charset class:
Charset.defaultCharset() You could change the default encoding used by your application by means of setting this variable when you lauch the application, for instance:
> java -Dfile.encoding=UTF-8 > java -Dfile.encoding=ASCII > java -Dfile.encoding=UTF-16 > java -Dfile.encoding=Cp1252 > java -Dfile.encoding=Cp500 If you, for instance, use UTF-16 every character will ocupy two bytes, but if you use ASCII, every character will occupy just one byte.
The String class has a method
getBytes(String charset) that lets you set the encoding used to generate the bytes.
Notice how using different encoding yield different number of bytes:
Another option is to use the
CharsetEncoder and
CharsetDecoder classes.
If you use ASCII the generated bytes will correspond with the ASCII character numbers, it means that if you assign every byte of the array to a char variable you will get the corresponding ASCII charactrer back again:
But it will print unknow characters if you use another encoding.
[ May 26, 2006: Message edited by: Edwin Dalorzo ]