I have XML (of size around 2 MB) that I need to parse using JAXB but have no control over the creation of the XML. XML comes from a thrid party.
Unfortunately XML contains things like:
<name>James & Colin</name>
And when i fed the xml to the jaxb parser it gave me an error as
"The entity name must immediately follow the '&' in the entity reference"
Is there a work around for this in JAXB? I need to parse the <name> element as "James & Colin"
When i googled for solution i get solutions like change & to & . But i dont have control over the creation of xml.
Yes, that's correct. The solution is to correct the XML so that the ampersand in the text node is escaped properly.
There's no workaround for malformed XML, which is what you have there. If you don't have control over creating the XML, then send it back to whoever does have control and get them to fix it. In the HTML world it may be acceptable to generate malformed HTML, and browsers will attempt to parse it, but in the XML world it doesn't work that way. Malformed XML doesn't get parsed. That's a rule of XML.
And then they are going to have "A&P" in a text element, which that particular hack won't catch. Or they're going to have "<" unescaped in a text element. Or something else. There's really no future in trying to clean up other people's malformed XML.
There are only five XML special characters. Writing a good "scrubber" to clean up the file to make it compliant is a solution, when better alternatives are not possible or cost-prohibitive, or there are political obstacles.
Sure it is. I just don't think you should present it as such without mentioning that a good scrubber would be extremely difficult to write.
Thanks for sharing your opinion. I, on the other hand, would not mention that it would be extremely difficult to write because I do not know the OP's abilities or the
resources that he/she has access to. Your view that it would be "difficult" is subjective and based on your interpretation of the difficulty.
Secondly, I have written many such applications rather easily, many times, over and over again. So, from my perspective it would an easy task. But it would be wrong for me to describe it as such, again because subjective opinions are ill-placed in Internet forums involving mostly strangers.
Of course it's my subjective opinion. (There aren't any other kind of opinions as far as I know.) But it's not a worthless opinion either... you didn't mean to suggest that, did you? I also disagree with your opinion that it would be easy to write a scrubber for bad XML, but then we haven't agreed on what this scrubber should really do so there isn't really much to agree or disagree about.
And of course I didn't mean that the OP should attempt that task himself; it is possible to make some estimate of his abilities, for example he doesn't know a basic feature of XML, namely escaping, so he's a beginner in the XML world. Writing a scrubber shouldn't be something he's attempting just yet.
Of course if it's really not too hard to do such a thing then it should already exist and be posted on the Internet. I haven't looked for such a thing because my position is the same as that of the designers of XML, namely that it is the responsibility of the creator of an XML document to make sure it is well-formed.
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop