Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Parsing XML files : DOM vs SAX  RSS feed

 
jwalababu vedantam
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What are the guidelines that one should follow in order to get the best performance in XML parsing, DOM or SAX?.
 
Ajith Kallambella
Sheriff
Posts: 5782
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The choice between SAX and DOM should be driven by your design and what you want to achieve. Each parsing method has its own advantages and pitfalls.
SAX is lightweight- it is faster and does not consume a lot of memory. However since DOM creates a tree representation of the XML document, it consumes more memory. Even in terms of time taken to read the XML document there is quite a difference. Since SAX is based on event-callbacks, parsing begins soon after the method call and your program participates "alive" in the parsing process. For DOM on the otherhand, since it has to read the entire document, there is some latency before the call to parse returns.
These inferences should not make you think SAX is better than DOM in all situations. Since you have to implement all the methods defined in the ContentHandler interface, there will be a large number of call-backs and method invocations during the parsing process and the number of such callbacks varies directly with the size of the XML document. This again, is a performance issue because method invocations( callbacks ) are expensive. DOM on the otherhand just reads the XML, creates the tree and allows you to manipulate the content and hence there is no question of callbacks or repeated method invocations.

While these are the performance considerations, the choice between SAX and DOM is also often driven by other factors such as 'mutability' of XML data and non-sequential access. These factors, however are beyond the scope of this forum.
To summarize, SAX is good for small XML documents provided you don't have to change the content ( the XML data ). DOM is bad for large documents because of the memory consumption.
Hope that helps,

------------------

Ajith Kallambella M.
Sun Certified Programmer for the Java2 Platform.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!