DOM
DOM is a language and platform neutral definition i.e. interfaces are defined for different objects comprising the DOM but no specifics of implementation are provided.
XML document can be represented as a tree that shows all elements and their relationships to one another.
Each item in the tree is a node – an atomic piece of information that can be manipulated
DOM working:
1. Parses the file, breaking the file into individual elements, attributes etc.
2. Creates a representation of XML file as a node tree
3. Access the contents of the document through the node tree
DOM loads an entire document into memory and parses it into a document tree. It tends to consume an inordinate amount of memory.
Node parentNode – a property in object Node : For Document, DocumentFragment and Attr nodes, parentNode is always null
Node cloneNode(boolean deep) – a method in object Node : Duplicate node that is created has no parent
Element createElement (DOMString tagName) – a method in Document : This does not actually attach the element to the node tree. For that appendChild() method is called.
Node setNamedItem(Node nodeArg) - a method in NamedNodeMap : Same Attr node may not be added to more than one element, the Attr node needs to be first cloned before it may be added elsewhere.
Node removeNamedItem(DOMString name) – a method in NamedNodeMap : Attr nodes with defaults removed in this way are instead modified to take the default values.
SAX
SAX parsers process the XML documents sequentially
Each element is parsed down to its leaf node before moving on to the next sibling of that element, therefore at no point is there any clear relation of what level of the tree we are at.
When is SAX used:
1. For handling large documents
2. Retrieving a specific value
3. Creating a subset of the document
When is DOM used:
1. Modifying the document (as SAX is read-only)
2. Random access
Handler interfaces:
1. Org.xml.sax.ContentHandler
2. Org.xml.sax.ErrorHandler
3. Org.xml.sax.DTDHandler
4. Org.xml.sax.EntityResolver
XPath
XPath allows selected or filtered information from within the source XML data or document to be exchanged or displayed
XPath is designed to enable addressing of or navigation to chosen parts of an XML document
XPath is used to navigate a hierarchical structure
Context node: starting point
Axis – specific direction that is selected
Location steps – steps taken to reach the destination
7 nodes and 13 axes
A location path consists of one or more location steps
Node
test: specifies the type of node selected and its expanded name
Location step has 3 parts: axis, node test, zero or more predicates
XPath has both abbreviated and unabbreviated syntax
Unabbreviated syntax – axis::nodetest [predicate]
The representation of the abstract structure of the XML document is called a data model
7 nodes are:
1. Rood node
2. Element node
3. Attribute node
4. Text node
5. Comment node
6. Namespace node
7. ProcessingInstruction node
An element node is the parent of an attribute node but an attribute node is not the child of its parent element node.
There is a way to determine the
string value for a node:
1. Root node: the string value is the concatenation of the string value of the text node descendants of the root node in document order.
2. Element node: the string value is the concatenation of the string value of the text node children of the element node in document order
A comment node does not have any expanded name
A location path is a special form of an XPath expression that returns only a nodeset.
Context in XPath has 5 parts:
1. Context node
2. A pair of non-zero positive integers (context position, context size)
3. A set of variable bindings
4. A function library
5. The set of namespace declarations
13 axes are:
1. Child
2. Parent
3. Ancestor
4. Descendant
5. Ancestor-or-self
6. Descendant-or-self
7. Following-sibling
8. Preceding-sibling
9. Following
10. Preceding
11. Attribute
12. Namespace
13. Self
Important points on axes:
1. Child node – it never returns an attribute or a namespace node
2. Parent – if the context node is the root node, parent is null
3. Descendant – all children and children’s children and so on are returned except attribute or namespace nodes
4. Following-sibling – If the context node is attribute or namespace node, following-sibling is empty
5. Preceding-sibling - If the context node is attribute or namespace node, preceding-sibling is empty
6. Following – all nodes that are after the context node are returned except any descendants and excluding attribute or namespace nodes
7. Following - all nodes that are before the context node are returned except any ancestors and excluding attribute or namespace nodes
8. Attribute – if the context node is not element, null is returned
9. Namespace - if the context node is not element, null is returned
10. Self – contains the context node
XPath expression returns any of these: Nodeset, Boolean, String or Number
Nodeset functions:
1. Count()
2. Id()
3. Name()
4. Local-name()
5. Last()
6. Namespace-uri()
7. Position()
Boolean functions:
1. True()
2. False()
3. Not()
4. Lang()
5. Boolean()
Number functions:
1. Ceiling()
2. Floor()
3. Round()
4. Number()
5. Sum()
String functions:
1. Concat()
2. Contains()
3. Normalize-space()
4. String()
5. Substring()
6. String-length()
7. Starts-with()
8. Substring-before()
9. Substring-after()
10. Translate()
Absolute location paths are special case of relative location paths where the context node is the root node of the XML document.