• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Tim Cooke
Sheriffs:
  • Rob Spoor
  • Liutauras Vilda
  • paul wheaton
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Mikalai Zaikin
  • Carey Brown
  • Piet Souris
Bartenders:
  • Stephan van Hulst

How to convert ISO-8859-1 to normal characters in JAVA

 
Ranch Hand
Posts: 64
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

How to convert string "¼ÆÐæ" {not able to post the string , it is being converted '&followedby#followedby188' for 1st character} }into "¼ÆÐæ"? Please help.


Regards,
Vijay
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is nothing abnormal about UTF-8.
Please read these 2 articles on characters and related concepts like encodings and glyphs - article #1 and article #2.
With that understanding, come back and explain what exactly is the problem you are having. "Convert UTF-8 to normal characters" is highly unlikely to be the actual problem or a solution to the problem.
 
vdammala vkumar
Ranch Hand
Posts: 64
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Karthik,

Thanks for responding to my query.

I've not gone through those articles yet , i will start looking into it now.

Please have a look into this url : http://www.w3schools.com/charsets/ref_html_8859.asp

I am trying to convert the "EntityNumber" given in that url into "Character".
 
Saloon Keeper
Posts: 28227
198
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are 3 primary encodings used in the Western world for web pages: ISO-8859-1 (Latin-1), UTF-8, and Windows CP-1252. They will all render fairly faithfully on a stock web browser - at least if you bang on it a couple of times. UTF-8 is a compacting encoding where character codes are not always rendered in a single byte.

Entities are not part of the encoding, they exist to assist in representing and rendering characters. ISO-8859 and CP-1252 are 8-bit encodings. That means that Unicode values that cannot be represented in a mere 8 bits (anything higher than 255) have to be supplied using alternative representation - XML/HTM Entities.

The XML standard from which HTML derives defines 5 named entities (& ' " < and >). Even though these particular values carry numeric codes less than 255, using them allows parsers to avoid becoming confused and seeing a quote mark that supposed to be part of a text element as being the terminator of the text element. Any code value (chaacter) can be represented in entity form, but in practice, it would be tedious, so such forms are only used when a more succinct code isn't available.

HTML supports some additional named entities beyond the basic XML set, but I don't happen to have an official list to hand. In any event, What you seemed to want to represent was the character designating a fraction of 1/4. The numeric entity code for that when rendered using ISO-8859-1 fonts is ¼ and and I note that yes, it does carry an official HTML entity name (frac14).

The syntax for an entity is "&value;" Note the terminal semicolon. Some people write sloppy webpages and some browsers will forgive the absence of that terminator, but not all.

When you template a webpage using an entity, the entity itself is encoded on the page explicity. When the page is sent to the client, the entity is carried explictly to the client. It is the client's job to recognize the entity and render it according to the code page in effect. Within Java strings, entities have no special powers and likewise when you simply print using System.out/System.err. Instead the actual Unicode character value should be present there when treating it as a character and only when the String is to be sent out in a webpage would it contain entity encoding and that would be solely for the client's benefit. The task of actually converting to and from entity encoding is up to external logic.
 
Marshal
Posts: 28304
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

vdammala vkumar wrote:I am trying to convert the "EntityNumber" given in that url into "Character".



We would have to know how you are getting the entity ¼ for a start. Is it in a text file you are reading? Why do you need to convert it? It may be that you don't have to do anything at all with it -- perhaps your problem is that you are displaying the file using the wrong encoding. More details would be useful.
 
Bring out your dead! Or a tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic