• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

apostrophe not getting rendered correctly

 
Ranch Hand
Posts: 618
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
XStream is used to generate XML in our application, but I don't think this problem is specific to XStream. I'm wondering if there's something I can do in Java to make sure that this problem doesn't happen.

We are sending an XML file to another internal application, and I'm having trouble with an apostrophe in one of the XML nodes. It's a simple object and there is no special converter registered for it, and its comments field is just a String. In two UNIX testing environments, we are seeing different behavior, and I don't even know where to begin to look for the cause. At first, I assumed their parser was mishandling something, but upon investigation, we found out that this is what they're seeing in the XML file they receive from us:


XML: <comments>These are Stephen's comments.</comments>


Actually, it's not exactly what you see above--it's actually Stephen followed by an ampersand followed by apos; and then the final s (the end of "Stephen's"). Anyway, this renders correctly as: These are Stephen's comments.


In another environment (that should be the same as far as character encodings are concerned), here's what they see coming from us:

XML: <comments>These are Stephen&#39;s comments.</comments>

"Stephen's" gets incorrectly rendered as Stephen followed by an ampersand followed by #39; followed by the final s.

I don't know how this could be happening if the code is the same. Has anyone seen something like this before? Is there a way I can put in a bit of safety code to make sure this doesn't happen in any environment?

Thanks,
Stephen

[ May 01, 2008: Message edited by: Stephen Huey ]

[ May 01, 2008: Message edited by: Stephen Huey ]
[ May 01, 2008: Message edited by: Stephen Huey ]
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Stephen Huey:
In another environment (that should be the same as far as character encodings are concerned), here's what they see coming from us:

XML: <comments>These are Stephen&#39;s comments.</comments>

"Stephen's" gets incorrectly rendered as Stephen followed by an ampersand followed by #39; followed by the final s.

I don't know why different versions of the XML are being produced, but what you posted there isn't incorrect and it's equivalent to the other example. (That's a numeric character reference for the apostrophe.) There shouldn't be any complaints about it, except misguided ones from people who are eyeballing the XML or people who are using non-compliant XML parsers.
 
Stephen Huey
Ranch Hand
Posts: 618
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I put in the text incorrectly for the second (bad) environement (actually, the browser rendered it differently from the way I typed it in). What we saw in the XML in the 2nd environment was:

"Stephen" followed by the ampersand symbol followed by "amp;" followed by "#39;s"

This gets incorrectly rendered as "Stephen" followed by the ampersand symbol followed by "#39;s"

So, I know that #39 is the correct numeric code, but I'm wondering if what's happening is some sort of halfway conversion. On the following page, the correct numeric code for an apostrophe has the ampersand symbol in front:

http://www.w3.org/MarkUp/html-spec/html-spec_13.html

So if we now for some reason have

[ampersand symbol] plus "amp;" plus "#39;"

then I'm guessing the parser might translate

[ampersand symbol] plus "amp;"

into the ampersand symbol and on that cycle of converting it didn't know what to do with "#39;" since it didn't see an ampersand symbol in front of it. What I don't know is why that would've happened in the first place!

[ May 01, 2008: Message edited by: Stephen Huey ]

[ May 02, 2008: Message edited by: Stephen Huey ]
[ May 02, 2008: Message edited by: Stephen Huey ]
 
reply
    Bookmark Topic Watch Topic
  • New Topic