• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Escape Encoding for XML attribute

 
Greenhorn
Posts: 16
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am creating XML using Xerces by Serialization

I am having requirement where i am having a XML tag as <Turnover> which has attribute with the name attribute.
Normally it looks like this.
<Turnover company="XYZ">28733</Turnover>

But if the company name is ваыс which i am encoding to ваыс using StringEscapeUtils.escapeXml(String)but when i amserializing it appears like this

<Turnover company="&#1074;&#1072;&#1099;&#1089;">28733</Turnover>
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But is it a problem that the non-Latin letters are encoded using entity escape codes? Is there a reason why you want to have the characters literally in your XML file?

What matters is that when another program reads the XML, it will get the right characters back. And that will happen automatically with these escape codes, if the other program uses an XML parser that is any good.
 
ashwin vulugundam
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
yes this is happening for non-latin letters..
StringEscapeUtils.escapeXml(string) is used to convert these letters to XML standard
If this is not used it is publishing exact letters that is non-latin in the XML which voilates XML standard
 
Marshal
Posts: 80977
529
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think this thread would sit better on our XML forum. Moving.
 
ashwin vulugundam
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to put more light to my problem.
I was using xerces libraray.
I had similar issue with Element Text which were foreign character(Chineese,Russian etc) ,
I was converting to UTF-8 encoding by StringEscapeUtils class (Apache comon library) which has a method escapeXML .This method do convert the content to proper UTF-8 representation.
for e.g ивыф gets converted to çř somthing like that


After the doccument object is prepared i was Serializing the object using OutputFormat which is again part of Xerces. After serailization i found that it is again encoding the obtained string.
&#231;&#345;
As & is not valid in XML

OutputFormat has method setNonEscapingElements which does not allow the serializer to encode the element text again.


But the irony is that there is no way i could stop Encoding of the attribute.
Can i get some valuble suggestion from any one in this regard.

 
Sheriff
Posts: 28436
104
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I missed the explanation of why you decided to use that Apache StringUtils code. The Xerces serializer can produce a document encoded in UTF-8 perfectly well all by itself. You don't need to do anything at all to the data. So why did you decide to use StringUtils? It seems totally unnecessary to me.
 
ashwin vulugundam
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
if i am not using StringUtils i am getting exactly same string.
for instance element text is ваыс
the exact string will be there in the XML, which is not in sync with XML standard.

That is the reason i am using StringUtils
 
Paul Clapham
Sheriff
Posts: 28436
104
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

ashwin vulugundam wrote:for instance element text is ваыс
the exact string will be there in the XML, which is not in sync with XML standard.



What makes you think that? And what exactly do you think is wrong with it?
 
ashwin vulugundam
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
if element comes like this in the XML
<element>ваыс</element>

that i is not the correct way . i want it to be in UTF-8 standard which would something like this
<element>çè&234</element>

when i open this XML this UTF-8 equalent characters to appear as ваыс..

So the point i would like to make is ваыс is not getting converting explictly by the Serialization object provided by Xerces. So i am using StringUtils to convert to proper UTF-8 standard , which would like this çè&234 , as ampersand(&) symbol is not allowed in XML its getting converted as & which i do not want as i had allready encoded , so i am using method setNonEscapingElements which would tell Serialization object take the String and do not encode further.
But if i am having attribute i am not able to escape encoding of it with Xerces
 
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@op:
When one does not see what one expects or that the javac does not compile without warning or even error etc... it is easy to get carried away by imagining this or that character(s) cause trouble violating some imagined "xml standard". In fact, it is not that at all most of the time. Many statements made are so very off... But what causes the problem, it is sometime not easy to determine if not having a full description of the process from editing the .java to whereabout the output goes for inspection.

[1] Pick a text editor capable of utf-8 encoding.
[2] create that xml through xerces (using dom I suppose...) such as some line like this.

[3] Save the .java of that encoding utf-8 the text editor is capable of.
[4] Compile it with javac, adding the switch "-encoding utf8".

Watch the output file in the proper encoding xerces (or xalan) is instructed output. I think you'll see the correct result. Watch the output to cmd console (if you're using windows os) as System.out would often be quite misleading, though not impossible without further setting in the application...
 
Paul Clapham
Sheriff
Posts: 28436
104
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Exactly. And this so-called "UTF-8 standard" is an imagined one too.
 
I am not young enough to know everything. - Oscar Wilde This tiny ad thinks it knows more than Oscar:
Clean our rivers and oceans from home
https://www.kickstarter.com/projects/paulwheaton/willow-feeders
reply
    Bookmark Topic Watch Topic
  • New Topic