Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Japanese character set

 
Anjali S Sharma
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What are the problems one may encounter while using Japanese character set. How can they be solved.

I came to know that Japanese character set is a 2 byte character set. Will it have any effect on my application which we intend to develop in Japanese.
 
Campbell Ritchie
Sheriff
Pie
Posts: 49733
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not got a lot of time to reply at present, but it means all Strings need two chars per code point. I posted about code points on this site in the last week; please search for that post, which I can't repeat for copyright reasons.

Don't know what else you will have to change.
 
Anjali S Sharma
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Campbell Ritchie:
Not got a lot of time to reply at present, but it means all Strings need two chars per code point. I posted about code points on this site in the last week; please search for that post, which I can't repeat for copyright reasons.

Don't know what else you will have to change.


Thanks for replying. Does that mean I have to use UTF-16 or will UTF-8 work as well
 
Campbell Ritchie
Sheriff
Pie
Posts: 49733
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome. Got a bit more time now, and have found my old post, here. The quote I posted suggests you probably will have to use UTF-16.
 
Anjali S Sharma
Ranch Hand
Posts: 279
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Campbell Ritchie:
You're welcome. Got a bit more time now, and have found my old post, here. The quote I posted suggests you probably will have to use UTF-16.


Thanks for the post.

This is what I have come to understand. If there is anything to correct from the list or add to it, please let me know


We can use either UTF-8 (should be used if there is plenty of Western text too. Otherwise it becomes less efficient, often using 3 bytes and even 4 per char) or UTF-16 without any problems. The only things that one need to watch out for while using Japanese characters are

1. Reading and Writing of files should be done using Reader/Writer (Java internally handles the encoding) and not InputStream/OutputStream
2. If there is any other software that is being used (like XML parsers), they should also use the same encoding (UTF-8 or UTF-16) with which the file (which needs to be parsed) was created.
3. Database encoding is another thing to ensure is correct
 
Campbell Ritchie
Sheriff
Pie
Posts: 49733
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't know any more about it, but I think your 3 points are all correct. Not certain, however.
 
Guido Sautter
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What's the difference between UTF-8 and UTF-16, with regard to what characters (code points) they can encode? Used to think both are (slightly different) ways of representing unicode characters (code points) with bytes ...
 
Paul Clapham
Sheriff
Posts: 21298
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Campbell Ritchie:
Don't know any more about it, but I think your 3 points are all correct. Not certain, however.


Number 1 is correct provided that the encoding of the file is the same as your system's default encoding. (This is unlikely to be the case if the file's encoding is UTF-8 or UTF-16.) Or provided you use an InputStreamReader or OutputStreamWriter which specifies the correct encoding.

Number 2: if your XML document is encoded in UTF-8 or UTF-16, then the standard XML parsers will be able to detect that and work correctly. (That's required by the XML spec.) This is provided you give them the chance. So pass them an InputStream or a File or a URL and they will deal with it. If you pass a Reader, then it's your responsibility to get the encoding right, so it's best not to do that.

Number 3: definitely.
 
Paul Clapham
Sheriff
Posts: 21298
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Guido Sautter:
What's the difference between UTF-8 and UTF-16, with regard to what characters (code points) they can encode?
They can both encode all Unicode characters.
 
Guido Sautter
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul Clapham:
They can both encode all Unicode characters.


Then why'd you answer the question if UTF-8 would work, or if UTF-16 had to be used to an effect that it was UTF-16 to use ... thought UTF-8 would work out just as fine. Confused me a little ...
 
Paul Clapham
Sheriff
Posts: 21298
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Guido Sautter:
Then why'd you answer the question if UTF-8 would work, or if UTF-16 had to be used to an effect that it was UTF-16 to use ... thought UTF-8 would work out just as fine. Confused me a little ...
I'm not sure what you are saying I said, but I don't see anywhere I said anything like what you seem to be saying I said. I'm confused.
 
Guido Sautter
Ranch Hand
Posts: 142
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry Paul, Campbell was the one who said what confused me ...
 
Campbell Ritchie
Sheriff
Pie
Posts: 49733
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm sorry about that.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic