• Post Reply Bookmark Topic Watch Topic
  • New Topic

Problem reading Unicode Characters of a Web Page

 
Chetan Pandey
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All:

I am using FileInputStream to read from a URL which is full of Unicode Characters.

But these Unicode Characters end up as : 'ā'.

I can live with this and I want to convert them to their Unicode Equivalent by replacing '&#x' with '\u' and getting rid of ';'.

But how?

I cant do String s = "\u" + "0101".

All help is appreciated.

Thanks.

Chetan
 
Chetan Pandey
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oops. I just realized my attempt to write '&_#_x_0_1_0_1;' (without the underscores) showed on the javaranch Web Page as 'ā'.

My question is once I have extracted the '0101' from the HTML String how do I make it '\u' + '0101' -'\u0101'-

Thanks.

Chetan
 
Frederico Bruno
Greenhorn
Posts: 10
Chrome Java Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to escape the back slash.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!