As this is a long document so I am posting the first part of it. I am still typing the rest of it, will post it as soon as its done.
BASIC XML
XML markup describes and provides structure to the content of an XML document or data packet.
Unlike HTML, XML is case-sensitive including element-tags and attribute values.
XML uses most of the characters defined in the 16-bit unicode character set.
2 unicode formats are the basis of XML characters � UTF-8 and UTF-16.
3 control characters are:
Horizontal Tab(HT) 09
Line Feed (LF) 0A
Carriage-Return (CR) 0D
5 special markup characters are: < > & � � These characters have alternate representations in the form of entity references.
Legal XML names:
First Charother chars (NmToken)
Unicode characterunicode character
Underscoreunicode number
Colon underscore
Colon
Hyphen
Period
Colon char should not be used except as a namespace delimiter
XML names should not begin with the
string � XML � in any form.
Elements are the basic building blocks of XML markup. Tags consist of element type names.
Everything between the start-tag and the end-tag of an element is contained within that element.
Examples: (here �sp� is space)
1. < sp ElementName> not allowed
2. <Name sp> allowed
3. <sp /Name> not allowed
4. </sp Name> not allowed
5. </Name sp> allowed
6. </Name /Name> not allowed
7. <Name sp/> allowed
8. <Name / sp> not allowed
Empty element tags may have associated attributes
XML documents have three parts �prolog (optional), body (required) and epilog(optional)
Document root/ Document entity is the root element of the XML document (which is not visible), this has a subtree(body), the root element of that subtree is called Document element/Root element.
Prolog may contain � XML declaration, comments, PIs, DOCTYPE declaration
Epilog may contain � PIs or comments.
XML data is in the form of a simple hierarchical tree.
All elements must be properly nested, no overlapping of tags is allowed.
String literals are used for the values of attributes, internal entities and external identifiers.
All string literals are enclosed by apos (�) or quot (�)
Attributes are comprised of name-value pairs.
Attributes:
1. Permissible values may be:
Text characters
Entity references
character references
2. Forbidden characters in attribute values: < and &. Use the entity references instead.
3. Only one instance of attribute name is allowed within a given tag.
All whitespace characters in the content are preserved and whitespace within element tags and attribute values may be removed.
3 combinations of chars for end-of-line are: CR-LF, CR only, LF only. All these strings are converted to a single LF character.
Except for the 5 built-in entity references, all entities must be defined prior to their use.
Comments:
1. Can�t have double hyphen within the string
2. Can�t be nested
3. Can�t be put in the start or end tag
4. Extra hyphen at the end is illegal
CDATA Section:
1. Can�t be empty
2. Can�t be nested
3. Text in the CDATA section can�t contain �]]>�
XML Declaration
1. Order of attributes: version, encoding, standalone is fixed.
2. Version attribute is required, encoding and standalone are optional.
3. Default value for standalone is �no�
4. If encoding is other than UTF-8 or UTF-16, it must be specified.
5. Encoding values are not case-sensitive
Special meaning attributes � xml:lang and xml:space( can have values preserve or default)
XML document has logical and physical structure. Physical � document has storage units: entities. Logical � document is composed of declarations, elements, comments, char references and PIs
Document Type Declaration contains or points to markup declaration that provides a grammar for a class of documents. This grammar is known as Document Type Definition.
No attribute name may appear more than once in the same start tag or empty element tag.
Attribute values cannot contain direct or indirect entity references to external entities.
Document Type Definitions (DTD)
DTDs are a set of rules that define how XML data should be structured.
Cooperating applications can share a single description of data known as XML vocabulary. A group of XML documents that share common XML vocabulary is known as document type and each individual document that conforms to a document type is a document instance.
Multiple documents and applications can share DTDs
Validity constraints ensure that any XML data conforms to its associated DTD.
Only one DTD may be associated with a given XML document or data object.
DTD has 2 parts � internal subset, external subset. DTD declarations in internal subset have priority over those in external subset.
An XML document can be associated with only one DTD using a single DOCTYPE declaration.
Syntax of DOCTYPE declaration:
1. <!DOCTYPE doc_element SYSTEM location [internal_subset]>
2. <!DOCTYPE doc_element PUBLIC identifier location [internal_subset]>
Only comments and PIs cam be inserted between XML declaration and DOCTYPE declaration.
DTDs are associated with the entire element tree via the document element.
�#� character as URI fragment identifier cannot be used in the location of a DTD.
The use of PUBLIC identifier should be limited to internal systems and legacy SGML applications.
Four basic keywords used in DTD declaration are:
1. ELEMENT
2. ATTLIST
3. NOTATION
4. ENTITY
ELEMENT:
Syntax: <!ELEMENT ele_name content_category>
<!ELEMENT ele_name (content_model)cardinality)>
Content_category : ANY or EMPTY
Content_Model : Text only, Element only, Mixed
Child elements in mixed content can appear (or not) n any order, any number of times.
Syntax of mixed content: <!ELEMENT foo (#PCDATA | child1 | child2)*>
1. No fixed sequence
2. #PCDATA must be the first item
3. �*� operator is needed as the mixed content doesn�t constainthe no. of occurences of the child elements.
ATTLIST declaration Syntax: <!ATTLIST element_name attrName attrType attrDefault defaultValue>
Attribute defaults: #REQUIRED, #IMPLIED, #FIXED, Default values
Attribute types: (10 in number)
1. CDATA
2. Enumeration
3. ID
4. IDERF
5. IDERFS
6. NMTOKEN
7. NMTOKENS
8. NOTATION
9. ENTITY
10. ENTITIES
Order of attributes cannot be enforced
ID attribute type must not be used with #FIXED
ID value must be unique within a given document
Only one ID attribute for each element type
NMTOKEN attribute prevents the inclusion of whitespace and some punctuation charaters
NOTATION can be used to identify
1. The format of unparsed entities
2. The format of element attributes of ENTITY and ENTITIES type
3. The application associated with a PI
Entities can be used to include a document inside a DTD
SCHEMA
Advantages of XML Schema
1. Support for data-types
2. Uses XML syntax
3. Support for content model ( mixed content, exact number of occurences of elements, named group of elements)
4. Extensible
5. Self documenting
Schema components is a generic term for the blocks that make up the abstract data model of the schema
3 groups of components : Primary, Secondary, Helper
Primary components:
1. Element declaration
2. Simple type definition
Built-in types: Primitive, Derived
3 varities of data-types: Atomic, List, Union
User derived types
3. Complex type definition
4. Attribute Declaration
Default value of minOccurs and maxOccurs is 1
Simple type can not have any child elements or carry attributes.
Simple types are the atoms of information considered distinct to XML Schema and they cannot be split up.
Primitive data types are data types in their own right and they are not defined in terms of other types
Derived typed are built from the definitions of other data-types
User derived types are derived by the author of the schema and are particular to that schema.
Atomic data type is one that has a value that cannot be divided atleast not in the context of XML Schema. Atomic data type is not analogous to primitive type. Atomic type can be primitive or derived.
Built-in list types are: IDREFS, ENTITIES, NMTOKENS
Named complex type is created when the content model is to be reused, otherwise anonymous types can be created.
�schema� is the root element of the Schema document.
Attributes are to be defined as part of the complex type because simple types can only hold atomic values and not carry attributes or have child elements.
Content models � ANY, EMPTY, Element only, Mixed
ANY is the default content model
EMPTY, for this define a complex type and restrict it from �anyType� so that it can only carry attributes.
Secondary components:
1. Model group definition
2. Attribute groups
3. Notation declaration
4. Identity constraints
Unique Values
Key and KeyRef
Default or Fixed element content
Specifying null values
Attribute groups can nest other attribute groups inside of them and rather like attribute declarations should appear at the end of the complex type.
Notation declaration � associates a name with an identifier for an application used to view that sort of a notation.
Key and KeyRef � primary and foreign key respectively.
XML Schema data types are composed of three parts: Value space, Lexical space, Set of facets.
Value spaces have certain facets: order, bound, cardinality, equality, numeric or non-numeric dichotomoy.
ENTITIES
All XML documents are comprised of units of storage � entities.
Document entity serves as the starting point for an XML parser.
External and internal subsets of DTD are also entities, but unnamed ones.
Main categories of entities:
Internal vs External
Parsed vs Unparsed
General vs Parameter
Internal entities can only be parsed
External entities can be both parsed/unparsed
General entities can be both parsed/unparsed
Parameter entities are always parsed entities and so can be internal or external
General entities are referenced by using entity reference �&name;� Parameter entities are referenced as �%name;�
Unparsed entities:
1. May or may not be text
2. Need not be XML text
3. Must have associated notation
4. Can only be used as the value of an attribute having ENTITY/ENTITIES type
The defining declaration should precede any references to the entity
General entities cause fatal XML Parse errors if:
1. Any reference to an unparsed entity
2. Any char or general entity reference in DTD except within an entity or attribute value
3. Any reference to an external entity from within an attribute value
Unparsed entities are always external
Entities can never be empty
An entity reference must not contain the name of an unparsed entity.
NAMESPACES
XML Namespace is a named collection of names
Qualified name - namespace prefix:local name
A namespace declaration applies to the element in which it is declared.
Unqualified attribute names do not belong to any namespace
Qualified attribute names belong to the associated namespace
Attributes are not explicitly part of any default namespace
Default namespace can be disabled by using an empty value in the default namespace declaration
XML namespaces do not work well with DTDs
An XML namespace is a collection of element type and attribute names
Two part naming system is he only thing defined by the XML namespace recommendation
XML namespaces contain names of element types and attributes not the elements or attributes themselves
If an element type or attribute name is not specifically declared to be in an XML namespace and there is no default namespace then that name is not in any XML namespace
XML namespaces do not apply to entity names, notation names or PI targets
No namespace declarations apply to DTDs
XML namespace prefix cannot be undeclared, it can be ovverridden by redeclaring the same namespace prefix to some other URI.
XLinks and XPointers
Links to external resources such as other XML documents, HTML documents or images
Utility:
1. To define relationships between similar documents
2. To define a sequence in which documents should be navigated
3. To embed non-XML content in an XML document
XLink attributes:
1. Type (possible values are: simple, extended, resource, location, arc, title)
2. Title � human readable string
3. Href � destination URI of the link
4. Role � function of link�s content
5. Arcrole � function of link
6. Show � how to render the link (new, replace, embed, other, none)
7. Actuate � when to trigger the link (onRequest, onLoad, other, none)
Simple links (XLink:type = �simple�) offer similar functionality to HTML hyperlinks while extended links offer greater capabilities
Simple links are a subset of extended links.
Simple links link two locations in one direction and the start of the link is always the declaration of the link itself.
The combinations like xlink:show = �replace� and xlink:actuate:�onLoad� do not make any sense.
Extended links allow more than one resource to be linked together and they may be specified out-of-line
3 types of extended links: inbound, outbound, third-party
Elements that have extended Xlink attributes have 4 sub-elements : Locator element, Resource elements, arc element and title element and 3 attributes: type, title, role
Extended links do not imply that their source is the document in which the link is located.
Locator element: To specify the locations participating in an extended link. Attributes: href, role, title, label
Resource element: To define participants in the link that are within the scope of extended link element. Attributes: role, title, label
Arc element: To define the navigable connections between locators participating in an extended link. Attributes: arcrole, title, show, actuate, from, to
Title element: Attributes: type
Inline links: Extended links may be embedded in one of the resources participating in an extended link.
Out-of-line extended links � a special type of arc element is used to indicate to an XLink-aware processor that out-of-line link exists for a particular document.
XPointer: to point to some portion of an XML document � individual sub-tree, attributes or even individual characters that are part of the text content.
HTML pointers use �#� (fragment identifier) to indicate that the text following it refers to a named anchor point, or fragment identifier in the targeted document.
3 ways to specify fragment identifiers: Bare names, Child Sequences, Full XPointers
Bare Names: Shorthand notation is provided for pointing to elements with IDs
Child sequences: pointed to be walking through the child element tree eg. /1/1/4/2
Points: point location may be a node or a particular location within character content