Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Extracting a nested XML document

 
Will Myers
Ranch Hand
Posts: 383
Java jQuery Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have a bean that is consuming an xml document that contains another xml document and has all the < and > tags replaced by HTML tags - I'm told this is standard practice. Can anyone point me in the right direction as to extracting the inner xml document in the correct format? If I try to parse it using:



it works but I can't then access any of the values....

 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The '<' and '>' characters are tag delimiters. What do you mean by "replaced by HTML tags"? Can you provide an example?

 
Will Myers
Ranch Hand
Posts: 383
Java jQuery Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
< becomes the html tag & lt; and > becomes & gt;

I would post an example but this forum converts them to < and > and I don't kbnow how to escape them
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. & lt; and & gt; are HTML entities. They are not considered "tags".

Attempting to "nest" an XML document within another sounds like a bad design idea and conflicts with
the core premise of XML. The difficulty you are encountering is a result of poor design.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13064
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bad design! you got that right! For years I have been dealing with a client who got stuck with this design.

A CDATA section is used to hide a complete XML document text - to work with it I have to extract the entire CDATA section to a String, build an org.xml.sax.InputSource from the String and parse that to a DOM.

Then of course all of the normal org.w3c.dom and related methods work to access values.

Bill
 
Will Myers
Ranch Hand
Posts: 383
Java jQuery Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bad design but there's not much I can do about that. I have got round it by using XPath to extract the inner xml that I'm interested in then just replacing all the HTML entities with the xml ones then working on the result as normal, bit of a faff....
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic