This is the FAQ page for the XML and Related Technologies forum. Contributions are welcome. Also see XmlLinks.
Q: The characters() method in my SAX parser doesn't return all the text (or is called more than once). What gives?
Here's what the javadocs of that method say: SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks. William Brogden explains :
The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time. It is up to the programmer to assemble the text properly.
I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method. When endElement() occurs I use toString() to get the assembled characters and then work on the logic.
XQJ - a standard API for XQuery processing in Java
XML Hammer "is a free and open-source tool that simplifies elementary XML actions like checking for well-formedness, validation, transformation and XPath searches using any JAXP implementation".
Xerces is a powerful XML parser that is now part of the JRE.
Crimson is a (now obsolete) XML parser that supports DOM, SAX and JAXP 1.1. It was used in the JRE before the switch to Xerces, and is a useful example for studying the inner workings of an XML parser.