Bookmark Topic Watch Topic
  • Mark post as helpful
  • send pies
  • Report post to moderator
This is the FAQ page for the XML and Related Technologies forum. Contributions are welcome. Also see XmlLinks.

Q: The characters() method in my SAX parser doesn't return all the text (or is called more than once). What gives?

Here's what the javadocs of that method say: SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks. William Brogden explains :

The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time. It is up to the programmer to assemble the text properly.

I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method. When endElement() occurs I use toString() to get the assembled characters and then work on the logic.


Java Code Examples

  • HowToValidateXmlAgainstSchema
  • HowToValidateXmlAgainstAnySchema (or DTD or Relax-NG)
  • HowToPrettyPrintXmlWithXsl
  • HowToPrettyPrintXmlWithJava
  • DocumentToFile
  • DocumentToString
  • DocumentToByteArray
  • StringToDocument
  • ByteArrayToDocument
  • GetElementValueByNameUsingDom
  • GetNodeValue

  • Articles and introductions


  • Introduction to XML
  • A Technical Introduction to XML

  • Specifically about Java

  • JAXP is the Java standard for XML processing; it is part of the JRE.
  • Introduction to XML processing in Java
  • Introduction to DOM and SAX Parsing
  • Introduction to XML and XML processing in Java
  • Using JAXP to process XML
  • Load, Save and Filter XML Documents Using the DOM Level 3 API
  • Unofficial JAXP FAQ
  • JAXP trail in the Oracle Java Tutorial
  • An Introduction to StAX (Streaming API for XML) (new in JAXP 1.4 and Java 6)
  • XQJ - a standard API for XQuery processing in Java

  • Software

  • XML Hammer "is a free and open-source tool that simplifies elementary XML actions like checking for well-formedness, validation, transformation and XPath searches using any JAXP implementation".
  • Xerces is a powerful XML parser that is now part of the JRE.
  • Crimson is a (now obsolete) XML parser that supports DOM, SAX and JAXP 1.1. It was used in the JRE before the switch to Xerces, and is a useful example for studying the inner workings of an XML parser.
  • dom4j, JDOM and XOM are alternative Java DOM APIs.
  • Xalan and Saxon are XSL-T processors.
  • Apache FOP is an XSL-FO processor that can output numerous formats, including PDF, PS, PCL, AFP, Print, AWT and PNG, and to a lesser extent, RTF and TXT.
  • Apache Santuario implements XML Signature and XML Encryption
  • JAXB is a Java <--> XML binding library.
  • Apache Commons Digester is an XML --> Java mapping library
  • NekoHTML, HtmlCleaner and TagSoup are libraries that clean up HTML and transform it to XML (thus allowing DOM and SAX to work with them).
  • a list of open source XML Diff and Patch tools

  • Certifications

    The formerly available IBM XML exams 141 and 142 have been retired on 12/31/2012. Online certifications are available at and

    These exam questions may help you gauge your XML knowledge, even if the associated exam is no longer available:

  • XML Design questions (by Ajith Kallambella)
  • Core XML (by Mapraputa Is)
  • DTD (by Sanjay Mishra and Dan Chisham)
  • DOM/SAX (by Kris VidhyaSagar)
  • XML 141 mock exam (by Shashank Tanksali)
  • IBM's XML Architecture prep guide

  • CategoryFaq XmlLinks
      Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!