• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Convert from \uXXXX format to corresponding character

 
Ranch Hand
Posts: 80
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dears,
Just asking a small question, I have an html file with ascii character on the form \uXXXX and english characters, an example:
<table align="center" cellspacing="1" width="97%" cellpadding="1" border="0">
<tr>
<td align="center" >\u627\u644\u645\u645\u62B\u644\u629
</td>
</tr>
</table>
All I want is to read the file and convert the \uXXXX characters to their corresponding characters?
Thankx
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First off, you need to be sure you know what format you're using here. You say it's \uXXXX with four digits, but your example consistently uses three digits. So I know it's not the same format as a Java unicode escape, which always uses four, but I'm not sure what format it really is. Are there always three chars in the sequence? Are they treated as hexadecimal? Is there an escape sequence for a plain \, e.g. \\?
Here's some code which assumes that a valid \u escape is followed by exactly 3 hexadicimal characters, and that \\ is an escape for \:

If you're not familiar with regular expressions, now's a good time to learn. The standard reference is Mastering Regular Expressions by Jeffrey Friedl; you may also want to check out Real World Regular Expressions with Java 1.4 by our own Max Habibi when it's released (soon I imagine). Or just study the java.util.regex API very carefully; that worked pretty well for me until I finally got around to reading Friedl.
Note that all the multiple \\ sequences can be confusing - the javac compiler uses this as an escape, and so does the regex package, and now so does the format you're parsing. So to represent an escape sequence of \\ in the HTML, the regex engine needs to see \\\\, which means javac needs to see a String literal with \\\\\\\\. Confusing at first, but it works.
[ December 28, 2003: Message edited by: Jim Yingst ]
 
Ashraf Fouad
Ranch Hand
Posts: 80
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thankx,
As I needed it really quick I thought of another way to get the \uxxxx format. As it is already found in ResourceBundle, I search in the source code of Java found in JDK_PATH\src.jar for this module and I found it:
java.util.Properties.loadConvert --> for converting \uXXXX to UNICODE
java.util.Properties.saveConvert --> for converting UNICODE to \uXXXX
really works fine.
Thankx too much for yr help.
 
Check your pockets for water buffalo. You might need to use this tiny ad until locate a water buffalo:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic