This is the FAQ page for the
XML and Related Technologies forum. Contributions are welcome. Also see
XmlLinks.
Q: The characters() method in my SAX parser doesn't return all the text (or is called more than once). What gives?
Here's what the javadocs of that method say:
SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks. William Brogden
explains :
The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time. It is up to the programmer to assemble the text properly.
I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method. When endElement() occurs I use toString() to get the assembled characters and then work on the logic.
JavaDoc:org.xml.sax.ContentHandler
Java Code Examples
HowToValidateXmlAgainstSchemaHowToValidateXmlAgainstAnySchema (or DTD or Relax-NG)HowToPrettyPrintXmlWithXslHowToPrettyPrintXmlWithJavaDocumentToFileDocumentToStringDocumentToByteArrayStringToDocumentByteArrayToDocumentGetElementValueByNameUsingDomGetNodeValue
Articles and introductions
General
Introduction to XMLA Technical Introduction to XML
Specifically about Java
JAXP is the Java standard for XML processing; it is part of the JRE.Introduction to XML processing in JavaIntroduction to DOM and SAX Parsing Introduction to XML and XML processing in JavaUsing JAXP to process XMLLoad, Save and Filter XML Documents Using the DOM Level 3 APIUnofficial JAXP FAQJAXP trail in the Oracle Java Tutorial An Introduction to StAX (Streaming API for XML) (new in JAXP 1.4 and Java 6)XQJ - a standard API for XQuery processing in Java
Software
XML Hammer "is a free and open-source tool that simplifies elementary XML actions like checking for well-formedness, validation, transformation and XPath searches using any JAXP implementation".Xerces is a powerful XML parser that is now part of the JRE.Crimson is a (now obsolete) XML parser that supports DOM, SAX and JAXP 1.1. It was used in the JRE before the switch to Xerces, and is a useful example for studying the inner workings of an XML parser. dom4j, JDOM and XOM are alternative Java DOM APIs.Xalan and Saxon are XSL-T processors.Apache FOP is an XSL-FO processor that can output numerous formats, including PDF, PS, PCL, AFP, Print, AWT and PNG, and to a lesser extent, RTF and TXT.Apache Santuario implements XML Signature and XML EncryptionJAXB is a Java <--> XML binding library.Apache Commons Digester is an XML --> Java mapping libraryNekoHTML, HtmlCleaner and TagSoup are libraries that clean up HTML and transform it to XML (thus allowing DOM and SAX to work with them).a list of open source XML Diff and Patch tools
Certifications
The formerly available IBM XML exams 141 and 142 have been retired on 12/31/2012. Online certifications are available at
http://www.brainbench.com/ and
http://www.xmlmaster.org/en/.
These exam questions may help you gauge your XML knowledge, even if the associated exam is no longer available:
XML Design questions (by Ajith Kallambella)Core XML (by Mapraputa Is)DTD (by Sanjay Mishra and Dan Chisham)DOM/SAX (by Kris VidhyaSagar)XML 141 mock exam (by Shashank Tanksali)IBM's XML Architecture prep guide
CategoryFaq XmlLinks