Win a copy of Mastering Corda: Blockchain for Java Developers this week in the Cloud/Virtualization forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

Using characters like quote, ampersand in xml

 
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Currently I am using encoding="ISO-8859-1" for my xml file.To parse this xml I am using SAX parser.

Can anyone tell me how can I include characters like ", &, < in my xml tags.
Can UTF-8 be the solution. If yes how?
 
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"&", "<" and ">", for example, are illegal characters in XML regardless of the encoding you specify. You need to encode those special characters with entities like "&amp;", "&lt;" and "&gt;".
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am getting the data from the user and hence cannot restrict these caracters.

E.g. Label can contain - "Boeing" or say "Airbus"
Then the xml must contain ""Boeing""

Suggest how can I do this programmatically.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just got through handling a similar problem. You can use the java.util.regex classes to do bulk replacement of the problem characters. For example, I have a class with:


The replacement operation looks like:

where tmp is a single line of input text - all occurances of the recognized pattern get replaced with "&".
However, this all gets very tricky with & because the & appears in the encoding for other special characters. So I first change all "& lt;" to "@lt;", (etc), then handle the single &, then change all "@lt;" to "& lt;"
It goes surprisingly fast.
Note that I had to insert spaces in "& lt;" etc in this post to get it to render properly.
Bill

[ October 24, 2004: Message edited by: William Brogden ]
[ October 24, 2004: Message edited by: William Brogden ]
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks William : This is ok as long as you are dealing with ASCII characters. Will the XML file be valid if I am gonna support other languages like German, or say Latin. :roll:

If possible please let me know how to write a XML format that is generalized to all language and character set barriers(Unicode).
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In theory, the Java XML libraries should be able to handle any valid UNICODE characters, but I don't have enough experience to advise you on what problems you would face.
Bill
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks William, It's been great help for me.

I came up with the idea to support ASCII characters in UTF-8 format by appending &# with the ASCII value. It works fine for all keyboard entries(English US)
 
Yeah, but is it art? What do you think tiny ad?
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic