hi! I am facing a problem regarding parsing the XML using IBM SAX parser API . I am parsing XML by overwriting startElement() , endElement() and character() methods. Every thing works fine , except for when I get following content in XML: <SECTION TYPE="COMTEX-JIMDASH"> <PARAGRAPH TYPE="NORMAL">APO Priority=r APO Category=1700 � BLOCK � � BLOCK � KEYWORD: WASHINGTON � BLOCK � SUBJECT CODE: 1700</PARAGRAPH> <PARAGRAPH TYPE="NORMAL" /> </SECTION> Please note the character above � BLOCK � .By � BLOCK � i mean a square figure( i am not able to type or paste that character here ) On this , after the startElement() the control doesn�t goes to character() function (otherwise it should go to character () method) and throws exception with message org.xml.sax.SAXParseException occured while parsing XML Invalid XML character. (Unicode: 0x8). The control passes to fatalError (SAXParseException e)method of HandleBase class. I am looking for the way I can read these special chracters � �. Secondly , if I get this special characters like this one , I want to continue with parsing XML . But ,as per API documentation : The default implementation throws a SAXParseException. Application writers may override this method in a subclass if they need to take specific actions for each fatal error (such as collecting all of the errors into a single report): in any case, the application must stop all regular processing when this method is invoked, since the document is no longer reliable, and the parser may no longer report parsing events.
I think you are hitting an illegal character (as I recall, 0x08 is the ASCII bell character) which is why it shows as a block. If you can't remove the illegal character at the source, you may have to run your input file through a "filter" to remove illegal characters. Look at the java.io.FilterInputStream class. You could interpose a filter between your source and the XML parser. Bill