Hello,
I was wondering if anyone could help clairify something for me. I don't really know how the character sets work so by all means let me know if my thinking is all wrong. Sorry in advance for long post.
Say user 1 has a system default charset of ASCII. They write a message in a JTextArea and hit the save button. The pgm calls JTextArea.write(myFileWriter) which saves the text to a file (using system default charset). They send the file to another user whose default charset is UTF-16. If the pgm simply loads the file into the JTextArea using JTextArea.read(myFileReader), wouldn't the text message get jumbled up? The UTF-16 machine would be reading two bytes per character when in fact the file was written out as 1 byte per character. Same is true the other way around. When the ASCII user loaded a UTF-16 file, it would treat each byte as 1 character when in fact two bytes represent one character. That is where the confusion is.
The only way I could see to control this was to have a rule that says the files will always be in a specific format, say ASCII? Then before writing the contents to file, I would call String.getBytes("ASCII") on the of the JTextArea -- when doing this on the UTF-16 machine, I assume if it encountered a char whose value was > 255 it would simply convert it to some char like "?" whose value was <= 255 so it would fit in 8 bits? Then write that byte[] to the output stream.
Then to load the file, instead of using JTextArea.read(), I would have to read the bytes into a byte array then create a new
String using String(byte[], "ASCII") and pass that to the JTextArea?
Any dropped information from the UTF-16 file would simply show up as "?" on the ASCII machine. On the UTF-16 machine, everything would look fine? No double spaced characters or such?
Is there another way?
Jim