Forums Register Login

java String UTF8

+Pie Number of slices to send: Send


My goal is to get a String and convert to UTF8.
1. The above way is wrong. See the comment
2. I can't set my own default locale.
3. Before we change it into UTF8, we should know the string's orginal encoding . But how could I know this ?

Thanks
+Pie Number of slices to send: Send
Strings in java are always stored in unicode UCS-2 (also know as UTF-16). When you ask how can you determine the encoding of a String, I assume you mean some series of bytes in a file. Unfortunatley, there is no way to determine this from the bytes alone, you have to know the character encoding used to encode the characters into bytes. To get non-ascii characters into a String in a java source file you can use \u. Character sets are simply mappings between a number and a character (e.g. Unicode). Character encoding are mappings between this number and a sequence of bytes (e.g. UTF-8, UTF-16).

String myString = "\u0048\u0065\u006C\u006C\u006F World";
System.out.println(myString);
byte[] myBytes = null;

try
{
myBytes = myString.getBytes("UTF-8");
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
System.exit(-1);
}

for (int i=0; i < myBytes.length; i++) {
System.out.println(myBytes[i]);
}


Francis
+Pie Number of slices to send: Send
Thanks.

if I have a string like

String aa = new String(" \u67e5\u770b\u5168\u90e8");
System.out.println(aa);

Sometimes, the system output the UTF code, sometime it output the real Chinese character. It looks weird. Why ?

2. The UTF coding is unique in any system ? No matter what OS, what locale, a Chinese character should have same one UTF code ? This concept is correct ?

3. The unicode and UTF8 are different concepts ? In my understanding, UTF8 is A kind of unicode . Is it right ?

Thanks.
+Pie Number of slices to send: Send
UTF-8 is not Unicode, it is a way of encoding unicode. See:

http://www.cl.cam.ac.uk/%7Emgk25/unicode.html#unicode

for a good explanation of the differences.

If you are finding that on one system your program is working correctly and outputting chinese characters, and on another it is not (maybe it is printing empty squares or question marks), this is almost certainly a font issue. You need to have a unicode font installed (such as the Microsoft Arial Unicode font available on an MS Office CD), to see the full range of characters in a UTF-8 encoded file.

All these sorts of issues are covered under the subject of Intenationalization (I18N). This is a good site on the subject:

http://www.joconner.com/javai18n/

regards,

Francis
Men call me Jim. Women look past me to this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 48772 times.
Similar Threads
output TimeZone list as "America/Los_Angeles")
is it possible to change jvm locale by using command line parameters ?
how to get a Unicode form data
question about locale
Problem converting form data to UTF-8 on solaris
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 28, 2024 20:40:17.