This week's book giveaway is in the Cloud/Virtualization forum. We're giving away four copies of Mastering Corda: Blockchain for Java Developers and have Jamiel Sheikh on-line! See this thread for details.
I'm trying to read an XML file that has Chinese characters in it & then output it to another XML file. But my output displays ??? when I open it with wordpad/IE.
Using WordPad, I pasted some Chinese characters which I got from website & saved the file as an Unicode document (infile.xml). The input file displays the characters correctly when I opened it with wordpad/IE.
I'd specified the encoding type to be UTF-8 for the output. Am I missing something here?
Chinese characters fall under the Unicode 16-bit character set. By opening your read file as UTF-8, you are in effect splitting the first 8 bits from the input character thus destroying the format of the original 16-bit character. So, if I understand what's going on here correctly, you're reading a 16-bit Unicode character as two 8-bit characters. Possibly, when you re-write the 8-bit characters, the order or endian-ness of the characters gets reversed.
Hope that helps. (Or even makes sense. I need another cup of coffee...)
Give a man a fish, he'll eat for one day. Teach a man to fish, he'll drink all your beer.
Cheers, Jeff (SCJP 1.4 all those years ago...)