posted 19 years ago
The main interface involved in SAX is a ContentHandler. You write your own class that implments this interface. You supply methods to respond to events. One method is called when the document starts, another when the document ends. One is called when an element starts, one when it ends. Between these two there may be calls to a "characters" method if there are text character specified between the start end end tags. If elements are nested, you may get two starts then two ends.
The entire procesing is up to you. The sequence follows the input source. If you don't care about a specific element when it is processed, do nothing.
When the document end method is called, SAX is finished. Whatever you have kept in whatever format is all that is kept.
This is in contrast to DOM which reads the entire input and constructs a tree of elements. Then entire source is represented by the tree. You can move elements or attributes around to make a different file, you can run it through a transformer. You can search it using XPath to find sequences of elements or structures in the document and process them as you wish. When you are done, you can serialize it (to produce an XML file, or an xml-format stream.
So, SAX is a Simple API for XML as its name implies. It does not have large demands for memory. You can process a huge file and if you don't want to keep much data, or you are summing data from the elements that go by, you will not require much memory. DOM builds a tree of Nodes to represent the entire file. It takes more space to hold an element than it takes for the minimal character representation -- "<a/>" 4 characters vs. dozens or hundreds.
Both will process the same input, and with SAX, you will see all input as it goes by. You may keep what you want in whatever format you want. But, if you don't keep it, it is not stored somewhere for you to process unless you run the input source through SAX again.
Does this help you understand the differences?
Dave Patterson