• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Xerces Sax not parsing a Unicode char

 
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My SaxParser (xerces) is failing when parsing, complaining about Unicode: 0x1d.
I am reading from a file (InputSource), and have set the encoding to UTF-8.
Anyone have any ideas?
Thanks a million!
 
Leverager of our synergies
Posts: 10065
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"This character cannot be used in XML documents"
Zvon.org
 
karen obrien
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is there any way to not parse data in specified xml elements? Without explicitly escaping the illegal character....
Thanks.
 
karen obrien
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is there any way to not parse data in specified xml elements? Without explicitly escaping the illegal character....
Thanks.
 
Mapraputa Is
Leverager of our synergies
Posts: 10065
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
CDATA sections are not parsed by the parser.
<someTag>
<![CDATA[
some content goes here
]]>
</someTag>
I am not sure illegal symbols are allowed in CDATA sections, though.
What does this symbol represent in your data? If it's part of binary data, maybe you should probably use Base64 encoding to include them in an XML document.
 
karen obrien
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for your help, Mapraputa.
I had already tried enclosing the offending text in a CDATA tag, but the parser still complains.
The character itself is : ∝ , and I'm sure there must be some way the parser can avoid parsing it?
Thanks again.
 
karen obrien
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Having performed some research, I discovered that this is a control character and while it is an acceptable Unicode character, it is not a valid UTF-8 character.
Control characters are in the range U+0000....U+001F, and most of them are written out as '?'. 0x1d(Group Separator), however, is not escaped and therefore Xerces cannot parse it.
I have written a util class that escapes control chars in Unicode and this resolved my problem.<br>
Thanks for all your help.
 
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I am facing the similar problem, could you please give me the solution
Shekar
my id is : pandresatya@indiatimes.com
 
reply
    Bookmark Topic Watch Topic
  • New Topic