• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Knute Snortum
  • Paul Clapham
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Frits Walraven
Bartenders:
  • Ganesh Patekar
  • Tim Holloway
  • salvin francis

How to convert ISO-8859-1 to normal characters in JAVA  RSS feed

 
Ranch Hand
Posts: 64
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

How to convert string "¼ÆÐæ" {not able to post the string , it is being converted '&followedby#followedby188' for 1st character} }into "¼ÆÐæ"? Please help.


Regards,
Vijay
 
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is nothing abnormal about UTF-8.
Please read these 2 articles on characters and related concepts like encodings and glyphs - article #1 and article #2.
With that understanding, come back and explain what exactly is the problem you are having. "Convert UTF-8 to normal characters" is highly unlikely to be the actual problem or a solution to the problem.
 
vdammala vkumar
Ranch Hand
Posts: 64
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Karthik,

Thanks for responding to my query.

I've not gone through those articles yet , i will start looking into it now.

Please have a look into this url : http://www.w3schools.com/charsets/ref_html_8859.asp

I am trying to convert the "EntityNumber" given in that url into "Character".
 
Bartender
Posts: 20728
124
Android Eclipse IDE Java Linux Redhat Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are 3 primary encodings used in the Western world for web pages: ISO-8859-1 (Latin-1), UTF-8, and Windows CP-1252. They will all render fairly faithfully on a stock web browser - at least if you bang on it a couple of times. UTF-8 is a compacting encoding where character codes are not always rendered in a single byte.

Entities are not part of the encoding, they exist to assist in representing and rendering characters. ISO-8859 and CP-1252 are 8-bit encodings. That means that Unicode values that cannot be represented in a mere 8 bits (anything higher than 255) have to be supplied using alternative representation - XML/HTM Entities.

The XML standard from which HTML derives defines 5 named entities (& ' " < and >). Even though these particular values carry numeric codes less than 255, using them allows parsers to avoid becoming confused and seeing a quote mark that supposed to be part of a text element as being the terminator of the text element. Any code value (chaacter) can be represented in entity form, but in practice, it would be tedious, so such forms are only used when a more succinct code isn't available.

HTML supports some additional named entities beyond the basic XML set, but I don't happen to have an official list to hand. In any event, What you seemed to want to represent was the character designating a fraction of 1/4. The numeric entity code for that when rendered using ISO-8859-1 fonts is ¼ and and I note that yes, it does carry an official HTML entity name (frac14).

The syntax for an entity is "&value;" Note the terminal semicolon. Some people write sloppy webpages and some browsers will forgive the absence of that terminator, but not all.

When you template a webpage using an entity, the entity itself is encoded on the page explicity. When the page is sent to the client, the entity is carried explictly to the client. It is the client's job to recognize the entity and render it according to the code page in effect. Within Java strings, entities have no special powers and likewise when you simply print using System.out/System.err. Instead the actual Unicode character value should be present there when treating it as a character and only when the String is to be sent out in a webpage would it contain entity encoding and that would be solely for the client's benefit. The task of actually converting to and from entity encoding is up to external logic.
 
Marshal
Posts: 24467
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

vdammala vkumar wrote:I am trying to convert the "EntityNumber" given in that url into "Character".



We would have to know how you are getting the entity ¼ for a start. Is it in a text file you are reading? Why do you need to convert it? It may be that you don't have to do anything at all with it -- perhaps your problem is that you are displaying the file using the wrong encoding. More details would be useful.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!