• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Apply Unicode escapes?

 
Ranch Hand
Posts: 1970
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Is there an easy way to apply Unicode escapes to a text string, so that each \u#### is replaced by the equivalent Unicode character?

Basically, I need to replicate behaviour of Properties files, without actually using Properties.load().
 
Peter Chase
Ranch Hand
Posts: 1970
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No-one replied, and my own further researches suggest there probably isn't such an API method. So I wrote my own. I don't think my employer will mind this posting here, for the edification/scrutiny of Ranchers...



Something like that, anyway. Testing may reveal shortcomings.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One thing to beware here is that it's possible to have a double-backslash escape. So

\u####

is a Unicode escape, but

\\u####

is not. Or even

\\\\\\\\\u####

is a Unicode escape, but

\\\\\\\\\\u####

is not.
 
Peter Chase
Ranch Hand
Posts: 1970
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, that's true, and if I was writing a method for a totally-general application, I'd have to deal with that. In my situation, I am pretty sure that \u followed by 4 hex digits will only appear in the string if an escape code is intended.

The most common situation where problems occur with this is where the text being processed is actually an explanation of Unicode escapes! I can be sure my text won't be that.

As it is fairly easy to do, I could perhaps beef-up my regex so that it says not to match, if the text being matched is preceded by another backslash. That's still not perfect, as your loads-of-backslashes examples showed, but it would be a step in the right direction.
reply
    Bookmark Topic Watch Topic
  • New Topic