Forums Register Login
Language doubt

The input text is in different language.Lets say , Russian .

When i try to generate unicode hexadecimal value for this ; it didn't recognize the text and display '? ' in place of the text.
The default system file encoding is Cp1252.

Can any1 explain, how to do this or change the file encoding?

Welcome to JavaRanch

Please read this about why we don't like people writing "any1" or similar.
By no means an easy beginner's question: moving thread.
Welcome to JavaRanch.

How are you displaying the text that comes out of your program? Are you printing it in a console window? Be aware that the console window on the English version of Windows by default uses a font that does not support most of the characters that are in Unicode. If you try to print a character on the console that's not in the font, you'll get a '?'.

Changing the file encoding will not solve that problem; the console simply isn't able to display those characters with the default font.
The problem is, i have to take data from database that may b in some other language and then i have to display that data in a rtf document.

The steps that i have come up with are like this:
1. Fetch data from database;
2. Convert it into unicode hex value
3. Pass the unicode hex value to the rtf as a string from the java code.

For French,German,Italian its working, but , for other languages like Greek or Russian , its not.
Try it with a very small dataset in Russian, print out the hex values with the %x tags on the command lines, then compare the output with a Unicode set to confirm you actually have Russian letters. Russian is included on this Unicode page.
I am trying to take the russian text in a string ; but when i try to convert the char into unicode hex value, it take those characters as a '?' and display the unicode hex value of '?'.

So how can i make the code recognize those characters.
I am posting the code which i am using..

And the output i am getting is:

original = ???

roundTrip = ???
Try the String.toCharArray() method to split your writing into chars.
Iterate through the char[] with a for/for-each loop.
Print each character to screen with the %x tags. It may need an int cast.When you compare the hex values with the Unicode page I showed you yesterday, you can check that the correct numbers are shown.

I put my code into a simple class, and executed it, copying and pasting your original word as a test

java RussianDemo цитата
The char ц has the hex value 446
The char и has the hex value 438
The char т has the hex value 442
The char а has the hex value 430
The char т has the hex value 442
The char а has the hex value 430

You will have to check against the Unicode page, but that seems to be working. It is on a Linux box; the shell supports Unicode.

Ankit Saxena wrote:I am posting the code which i am using..

I'll repeat what I already wrote above: you are printing Russian characters to a console window with System.out.println(). The console window (in a normal, English version of Windows) by default uses a font that does not support the Russian characters, so you get question marks instead.

Your program might produce the right output, but if you display it in a console window, you won't see it, because the console window can't display it.
printf is not working for me . i have jdk 1.3 version.
printf() was added in 1.5
But then how it could be done, because i have only 1.3 and 1.4 version.
System.out.println("The char " + c + " has the hex value " + (int)c);

The printf method basically just puts the last arguments into the first argument, starting from left to right, replacing any part that starts with % (%% is used to show a single % character). Of course it does allow some more formatting (e.g. %04d to print a number with zeros padded to the left if smaller than 1000), but for the rest it's as easy as the above code.
You could use Integer.toHexString(c); to printout the unicode code point in hex.

Also you could write an html file with unicode code point values in decimal as html entities like ц or in hex like ц and open it in say the latest version of firefox and you should see the unicode characters rendered in the browser.

Another way would be to display the unicode in some Swing component.
First of all, thanks a lot for the suggestions.

I am getting the unicode value for the characters but its like for char 'ц' the value is 446 but in Windows-1251 encoding it's value is 'f6'. and i need this value to pass it to rtf file such that it can display that character properly.

So,how to do this?
I guess this is what you have to do.

OutputStream out = new FileOutputStream("russian.rft");
OutputStreamWriter os = new OutputStreamWriter(out, "Cp1251");

then use one of the write methods of the OutputStreamWriter to write to the file.

Since you are using jdk 1.3 there is a complication. You will need to get the i18n.jar distributed with the international version of the 1.3 jdk. I am not sure whether 1.3 is now available for download. Hopefully you have it already.

Check the suppported character encodings for java 1.3

This thread has been viewed 1956 times.

All times above are in ranch (not your local) time.
The current ranch time is
Dec 10, 2018 00:32:47.