My SaxParser (xerces) is failing when parsing, complaining about Unicode: 0x1d. I am reading from a file (InputSource), and have set the encoding to UTF-8. Anyone have any ideas? Thanks a million!
CDATA sections are not parsed by the parser. <someTag> <![CDATA[ some content goes here ]]> </someTag> I am not sure illegal symbols are allowed in CDATA sections, though. What does this symbol represent in your data? If it's part of binary data, maybe you should probably use Base64 encoding to include them in an XML document.
Thanks for your help, Mapraputa. I had already tried enclosing the offending text in a CDATA tag, but the parser still complains. The character itself is : ∝ , and I'm sure there must be some way the parser can avoid parsing it? Thanks again.
Having performed some research, I discovered that this is a control character and while it is an acceptable Unicode character, it is not a valid UTF-8 character. Control characters are in the range U+0000....U+001F, and most of them are written out as '?'. 0x1d(Group Separator), however, is not escaped and therefore Xerces cannot parse it. I have written a util class that escapes control chars in Unicode and this resolved my problem.<br> Thanks for all your help.