My goal is to get a String and convert to UTF8. 1. The above way is wrong. See the comment 2. I can't set my own default locale. 3. Before we change it into UTF8, we should know the string's orginal encoding . But how could I know this ?
Strings in java are always stored in unicode UCS-2 (also know as UTF-16). When you ask how can you determine the encoding of a String, I assume you mean some series of bytes in a file. Unfortunatley, there is no way to determine this from the bytes alone, you have to know the character encoding used to encode the characters into bytes. To get non-ascii characters into a String in a java source file you can use \u. Character sets are simply mappings between a number and a character (e.g. Unicode). Character encoding are mappings between this number and a sequence of bytes (e.g. UTF-8, UTF-16).
String aa = new String(" \u67e5\u770b\u5168\u90e8"); System.out.println(aa);
Sometimes, the system output the UTF code, sometime it output the real Chinese character. It looks weird. Why ?
2. The UTF coding is unique in any system ? No matter what OS, what locale, a Chinese character should have same one UTF code ? This concept is correct ?
3. The unicode and UTF8 are different concepts ? In my understanding, UTF8 is A kind of unicode . Is it right ?
If you are finding that on one system your program is working correctly and outputting chinese characters, and on another it is not (maybe it is printing empty squares or question marks), this is almost certainly a font issue. You need to have a unicode font installed (such as the Microsoft Arial Unicode font available on an MS Office CD), to see the full range of characters in a UTF-8 encoded file.
All these sorts of issues are covered under the subject of Intenationalization (I18N). This is a good site on the subject: