• Post Reply Bookmark Topic Watch Topic
  • New Topic

Internationalization and Unicode  RSS feed

 
Layne Lund
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am writing a simple flashcard program for a friend. It reads in a set of words, displays the English version, and prompts the user to type in the German translation. My development machine is running Red Hat Linux 8.0, but my friends system is running Windows XP. So far this doesn't seem much of an issue; I can run my compiled .class files on his machine just fine (one for Java!).

My problem is with characters that aren't in the ASCII set. Let me set up a hypothetical scenario: String a is read from the file and the word contains an umlaut. String b was (basically) obtained from System.in. Apparently, a.equals(b) returns false since I get the "incorrect message" from my program.

Perhaps I should post some of the relevant code. In fact, the program is short enough, I can post the whole thing:

Are there any issues that I'm overlooking? I've never played with the non-ASCII portion of Unicode before, so I'm at a loss. Any help will be greatly appreciated.

Layne
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you know for certain that the file was written using UTF-16? If so, you're doing the right thing here; if not, you need to find out what encoding was used, and use the same encoding to read it. Since you're evidently sharing files between different systems, do not rely on default encodings. That is, do not use a FileReader or FileWriter, and don't use the single-argument constructors for InputStreamReader or OutputStreamWriter. Use only the constructors that allow you to explicitly specify an encoding. Which is exactly what you've done in the code above - but since you don't say how the file was originally written, I thought it worth a mention.

Having said that, I think the most likely problem here is that the user does not have the ability to write a ü to the standard input stream. Have your friend open a command line window and try to type ü. If they can do it, great; otherwise, your best bet may be to do the spelling comparison in a more forgiving manner. I'd use a Collator:

Experiment with the strength (and perhaps the Locale) of the Collator until you find a level that is lax enough to allow u and ü to be seen as equivalent.

Hope that helps...
 
Layne Lund
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you know for certain that the file was written using UTF-16? If so, you're doing the right thing here; if not, you need to find out what encoding was used...

No, I'm not certain that UTF-16 is used. It just happens to work correctly on my machine at home. The idea here is that the user can create the input files that are read in by the program. On my system, I used OpenOffice.org to save the text files in Unicode format. Apparently it uses UTF-16 for this. On my friends WinXP system, we used WordPad to save the text file (again in Unicode format).

I think the most likely problem here is that the user does not have the ability to write a � to the standard input stream. Have your friend open a command line window and try to type �.

We did that already and it works successfully. WordPad shows the umlaut-o that I typed, and when I view the file with "more" from the command line, it shows correctly as well. However, when I print out the word from my Java program, it doesn't show up correctly. I guess I should have pasted the output with my original message. Unfortunately, I'm on yet a different machine at the moment. I'll paste it here when I go to my friend's house later tonight.

If they can do it, great; otherwise, your best bet may be to do the spelling comparison in a more forgiving manner. I'd use a Collator...

I'll check this out in the API docs. Thanks for the suggestion. I'd still like to figure out what I'm doing wrong with the straight comparison, so any more input will be appreciated.

Thanks.
 
Layne Lund
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Before I posted my original question, I was already working on a basic Swing GUI for this program. I finally finished and tested it on my friends computer. It looks like the international characters work fine. So my guess is that the original issue was with running it in the console on Windows XP. It might be an interesting academic exercise to figure out what the difference is, but I don't think I care enough about it any more.

Thanks for your help, Jim.
 
Layne Lund
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had a serious brain fart this morning, but I haven't taken the time to test it out. I just thought I'd post my idea here to see what you think about it:

change

to

This will make sure that I'm using UTF-16 encoding while reading input from the console. Perhaps I also need to create a OutputStreamWriter for System.out in the same way.

Any comments?

Layne
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!