• Post Reply Bookmark Topic Watch Topic
  • New Topic

Playing around with Unicode  RSS feed

 
Jake Mauve
Ranch Hand
Posts: 45
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ahoy people =)

I am trying to make a basic (and useless) 'matrix-code' generator, was thinking of using our alphabet, our numbers and 'basic' japanese symbols (hiragana and katakana). Now I know the equivalent unicodes for all these letters and symbols.
The problem is, I want to avoid having to predefine every single Unicode by itself. I was hoping to randomly generate a sequence of Hexadecimal values, in the right range and then take that sequence and use it as the unicode 'code'.

this is more or less what I'm hoping to do:


The previous yields an 'illegal unicode escape' error message.

could someone please help ? ^^
Thanks a lot for your time and concern xD
J
 
Rob Spoor
Sheriff
Posts: 21133
87
Chrome Eclipse IDE Java Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to understand that the part after the \u is the hexadecimal representation of the character. Once you get the 304A part, all you need to is parse that to a char. Fortunately, char is numeric in Java so you can first parse it to an int, then cast it to a char. To transform that char into a String you can use String.valueOf.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1. ItDoesntWorkIsUseless.(⇐click)

2. When you put String s = "\u1234" in your Java source code, it's not a String with '\', 'u', '1', '2', '3' '4'. Rather, before the compiler even sees the String s = " part, the \u1234 has been replaced with the actual character. So that "\u" you're trying to append is too late.

Try this:


And note that unless you've explicitly set your console window to an appropriate encoding and font, there's a good chance that Japanese characters will just show up as question marks or squares. If that happens try using a GUI element such as a JTextPane or something.

For (a really braindead) example:

 
Jake Mauve
Ranch Hand
Posts: 45
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey, rob and jeff, my favorites repliers ;) Thanks a lot for the info, I will get right on it. I had guessed that using char type could help but I was having the same problem with my basic 'manipulation'.
Just one quick thing. Its true that on the system terminal, CMD, japanese chars tend to come out as other misc symbols. But, the terminal from my compiler program (BlueJ) has no problems displaying special type symbols. So my question would be, is there a small system tweak or change I need to do to make my CMD work fine with all type unicode 'symbols'/letters ?
So far Im not working with any graphic elements within Java, Im just using a simple System.out.print() ^^
J
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jake Mauve wrote:So my question would be, is there a small system tweak or change I need to do to make my CMD work fine with all type unicode 'symbols'/letters ?
So far Im not working with any graphic elements within Java, Im just using a simple System.out.print() ^^
J


Like I said, there are two main issues: Encoding and Font. You can set the Font on a CMD window (click the icon in the top left of the menu bar and select Properties), but I only get a couple of choices for mine, and I don't know of any way to change the Encoding.

Somewhere there's a system setting for supported languages and input methods. I forget just where. Adding Japanese to that might make it available in CMD windows as well. Not sure though.
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You'll never be able to display all Unicode characters at once in Windows console.

There is a system-wide setting somewhere in Regional settings in Control Panels, which lets you select a "codepage" to use with consoles. This codepage maps 8-bit characters onto a subset of Unicode characters (which is the fundamental reason why it cannot ever display all Unicode characters at once) and there certainly are some codepages designed for Japanese locales, though I haven't a slightest idea how they could look like.

Moreover, Windows uses two different codepages: "ANSI" and "OEM" (or, less formally, "Windows" and "DOS" codepages). The console window is set up to use the OEM codepage, which differs from the ANSI one. Since Java's default charset corresponds to ANSI codepage, you must use a non-default charset corresponding to the OEM codepage to write to the system.out stream, if you want the characters to be readable in a console. I've somehow discovered which charset corresponds to the OEM codepage in my environment (so that I can create Java command-line tools that display local characters correctly), but I don't remember how did I do that

However, if you do this and run the application in an IDE like Netbeans (or BlueJ), the program's output in the IDE; as IDE windows, like everything else on Windows except consoles, uses ANSI codepage. This can be worked around by setting the charset for the application by a parameter on the command line and using different parameter inside and outside of the IDE. At least this is how I've solved this issue.

It is also possible to set a different codepage for an existing console window: either using Windows API call (possible using JNI in Java), or by using the chcp command which cmd.exe recognizes (this could be useful eg. in a BAT file you'd use to run your application). I assume the meanings of the code-page numbers (the parameter of the chcp command) should be available somewhere on Microsoft's pages. I didn't explore this possibility, though, there might be some other caveats here.

(I'm not sure that my terminology matches the Microsoft's terminology exactly, if you research this subject deeper, you might encounter different terminology.)
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!