• Post Reply Bookmark Topic Watch Topic
  • New Topic

Converting special characters in a String to equivalent Unicode Escape Code  RSS feed

 
Ripan Singh
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

I want to covert special characters (e.g. a Latin alphabet) in a string to equivalent Unicode Escape Code.
Example: String "JØran" should be converted to "J\U00D8ran".

Do java provide any function/utility for this or there exist any third party library?
Thanks in Advance
 
Vijay Tidake
Ranch Hand
Posts: 148
Hibernate Java Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

you can use java native2ascii.exe in the <JAVA_HOME>/bin directory.

the commad used is native2ascii -encoding UTF-8 <text file have JØran> <text file going to have unicode>

Thanks
 
Ripan Singh
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to convert the Special Character in a java program where I have different indepedent string values.
I dont have a text file for conversion.
Don't java provides a class or method which can directly operate on a String?
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You don't need anything special. You think that a char is a character, but it isn't. It is an unsigned integer. There are all sorts of methods in the Character class which allow you to see which ranges a char is in. If it is in a particular range, you can convert it to hex and add a \u tag. Then use a StringBuilder to put everything back together.
 
Ripan Singh
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Campbell

It will be highly appreciated if you please write a sample code for the same.
Example: Converting string "JØran" to "J\U00D8ran".
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did. It took me about ten minutes. 31 lines or 28 if you miss out the blanks.
campbell@computer_name:~/java> java UnicodeCreator "Campbell Ritchie ßüÜöÖäļ JØran"
Campbell Ritchie \u00df\u00fc\u00dc\u00f6\u00d6\u00e4\u00c4\u00bc J\u00d8ran
We don't provide ready-made code.
 
Ripan Singh
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is really super cool

I know whole code can't be shared as per policy, kindly provide me few core code lines to have a idea
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Vijay Tidake wrote: . . . native2ascii.exe in the <JAVA_HOME>/bin directory. . . .
I never knew about that. Thank you. It works nicely; all non-ASCII characters are changed to their Unicode® escapes.
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You get the chars from the String as a char[].
You iterate that array; if a char is in your "normal" range, you append it to a StringBuilder.
If it is outwith your "normal" range, you append \u and its 4-digit hex representation to the StringBuilder.
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know how that will work for characters and glyphs whose Unicode® value is > 0xffff (65535).
 
Rob Spoor
Sheriff
Posts: 21131
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Those aren't valid Java chars anyway, as Java only goes from 0 to 65535.
 
Ripan Singh
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks a lot Campbell . I got it
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well done
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:Those aren't valid Java chars anyway, as Java only goes from 0 to 65535.
They are formed from two chars put together to form a code point, which is of type int. You can probably iterate the String getting code points, some of which would be > 0xffff.
As I said, I don't know how my technique ought to handle them. You could split them with i & 0xffff or i >> 0x10 & 0xffff. Remember >> has a higher precedence than &.
 
Tiya Khambadkone
Ranch Hand
Posts: 114
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I tried converting 'ü' and 'Ñ' to its unicodes using the following URL code (similar to the solution that Campbell suggested) but I get the same unicode value to all special characters.


for both special characters.

https://docs.oracle.com/javase/tutorial/i18n/text/string.html
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!