Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Language doubt

 
Ankit Saxena
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


The input text is in different language.Lets say , Russian .

When i try to generate unicode hexadecimal value for this ; it didn't recognize the text and display '? ' in place of the text.
The default system file encoding is Cp1252.

Can any1 explain, how to do this or change the file encoding?

Thanks.
 
Campbell Ritchie
Sheriff
Pie
Posts: 49447
62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch

Please read this about why we don't like people writing "any1" or similar.
By no means an easy beginner's question: moving thread.
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Pie
Posts: 15369
40
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch.

How are you displaying the text that comes out of your program? Are you printing it in a console window? Be aware that the console window on the English version of Windows by default uses a font that does not support most of the characters that are in Unicode. If you try to print a character on the console that's not in the font, you'll get a '?'.

Changing the file encoding will not solve that problem; the console simply isn't able to display those characters with the default font.
 
Ankit Saxena
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem is, i have to take data from database that may b in some other language and then i have to display that data in a rtf document.

The steps that i have come up with are like this:
1. Fetch data from database;
2. Convert it into unicode hex value
3. Pass the unicode hex value to the rtf as a string from the java code.

For French,German,Italian its working, but , for other languages like Greek or Russian , its not.
 
Campbell Ritchie
Sheriff
Pie
Posts: 49447
62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try it with a very small dataset in Russian, print out the hex values with the %x tags on the command lines, then compare the output with a Unicode set to confirm you actually have Russian letters. Russian is included on this Unicode page.
 
Ankit Saxena
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to take the russian text in a string ; but when i try to convert the char into unicode hex value, it take those characters as a '?' and display the unicode hex value of '?'.

So how can i make the code recognize those characters.
 
Ankit Saxena
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am posting the code which i am using..





And the output i am getting is:

original = ???
[B@765291

roundTrip = ???
un=\'3f\'3f\'3f\'3f\'3f\'3f
 
Campbell Ritchie
Sheriff
Pie
Posts: 49447
62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try the String.toCharArray() method to split your writing into chars.
Iterate through the char[] with a for/for-each loop.
Print each character to screen with the %x tags. It may need an int cast.When you compare the hex values with the Unicode page I showed you yesterday, you can check that the correct numbers are shown.

 
Campbell Ritchie
Sheriff
Pie
Posts: 49447
62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I put my code into a simple class, and executed it, copying and pasting your original word as a test
java RussianDemo цитата
The char ц has the hex value 446
The char и has the hex value 438
The char т has the hex value 442
The char а has the hex value 430
The char т has the hex value 442
The char а has the hex value 430
You will have to check against the Unicode page, but that seems to be working. It is on a Linux box; the shell supports Unicode.
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Pie
Posts: 15369
40
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ankit Saxena wrote:I am posting the code which i am using..

I'll repeat what I already wrote above: you are printing Russian characters to a console window with System.out.println(). The console window (in a normal, English version of Windows) by default uses a font that does not support the Russian characters, so you get question marks instead.

Your program might produce the right output, but if you display it in a console window, you won't see it, because the console window can't display it.
 
Ankit Saxena
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
printf is not working for me . i have jdk 1.3 version.
 
Maneesh Godbole
Saloon Keeper
Posts: 11070
13
Android Eclipse IDE Google Web Toolkit Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
printf() was added in 1.5
 
Ankit Saxena
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But then how it could be done, because i have only 1.3 and 1.4 version.
 
Rob Spoor
Sheriff
Pie
Posts: 20552
57
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
System.out.println("The char " + c + " has the hex value " + (int)c);

The printf method basically just puts the last arguments into the first argument, starting from left to right, replacing any part that starts with % (%% is used to show a single % character). Of course it does allow some more formatting (e.g. %04d to print a number with zeros padded to the left if smaller than 1000), but for the rest it's as easy as the above code.
 
Gamini Sirisena
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could use Integer.toHexString(c); to printout the unicode code point in hex.

Also you could write an html file with unicode code point values in decimal as html entities like ц or in hex like ц and open it in say the latest version of firefox and you should see the unicode characters rendered in the browser.

Another way would be to display the unicode in some Swing component.
 
Ankit Saxena
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First of all, thanks a lot for the suggestions.

I am getting the unicode value for the characters but its like for char 'ц' the value is 446 but in Windows-1251 encoding it's value is 'f6'. and i need this value to pass it to rtf file such that it can display that character properly.

So,how to do this?
 
Gamini Sirisena
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I guess this is what you have to do.

OutputStream out = new FileOutputStream("russian.rft");
OutputStreamWriter os = new OutputStreamWriter(out, "Cp1251");

then use one of the write methods of the OutputStreamWriter to write to the file.

Since you are using jdk 1.3 there is a complication. You will need to get the i18n.jar distributed with the international version of the 1.3 jdk. I am not sure whether 1.3 is now available for download. Hopefully you have it already.

Check the suppported character encodings for java 1.3
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic