Hi,
I use this decision tree when selecting an approach:
XML usage falls into one of these
patterns:
- Read > Extract
- Generate > Write
- Read > Extract > Modify > Write
- Read > Modify > Write
Read > Extract
------------------
Q1: Should my application logic manipulate the XML as XML itself, or convert them to a mirroring object model? What would be the maintenance costs?
[By "mirroring" object model, I mean that the XML elements and attributes have exact mappings to the object model, using something like JAXB / JiBX / XMLBeans / XStream..]
Answer "No"
- if XML schema is not available at all
- if XML is not well formed
- if XML schema is liable to changes or not under your control. Frequency of changes is a factor - high frequency means your object model also has to change.
- if representing XML contents as object model does not fit in very well (does not "feel right") with the rest of the logic
Answer "Yes"
- if XML is well formed + representing XML contents as object model fits in very well with the rest of the logic
Q2: It's a Yes to Q1, but the XML size is huge (100s of MBs to GBs) and I'm afraid of memory footprint. Should I still proceed with object mapping?
Answer:
Option 1: Though size may be huge, does app really need all the data to be read upfront?
If no - if the app has wriggle room to load and use partial chunks at a time - then using SAXSource or StAXSource, along with the JAXB Unmarshaller Listener concept, is an effective way to keep both memory footprint and app response times low.
It's basically load-on-demand.
This concept is useful, for example, if app is receiving XML over a HTTP channel (maybe from a RESTful service) and wants to use and show atleast partial data to user, instead of waiting for all the data to arrive.
I don't know whether other mapping APIs support something similar.
Option 2: Overall size may be huge, but maybe app only needs a portion of data and prefers that portion in an object model.
If so, reduce the original XML to the required portion using Transformation API with an XSLT that has the transformation rules (XSLT uses XPath expressions). But be cautious. If the original XML schema is not in my control, I will have to bear the maintenance cost of changing XSLT to keep with the schema changes.
Q3: It's a No to Q1...can't use - or don't want to use - object mapping for the reading part. Now what?
Answer:
1. XMLs are "small" (ie, even when fully loaded, you're ok with the memory footprint of the overall app - I keep 100 MB XML file size as my threshold for "small" on a desktop app).
XMLs are guaranteed to be well formed.
Want to concentrate on core app functionality more
Convenience and simplicity of XML reading more important than performance (response time) or memory.
=>
Use DOM (either the inbuilt DOM API which is Xerces based) or any of the other DOM APIs like dom4j or JDOM.
2. Same as step 1
+
Would also like to reduce those multi-nested loops to navigate my XML hierarchy and make maintenance easier
=> Use XPath API with DOM
3. XMLs may not be well formed
=> Use SAX or StAX (note: I'm not sure whether other DOM or DOM like APIs are more lenient on this aspect)
4. XMLs are huge (100s of MBs to GBs)
Memory footprint is likely to be too much
Have to reduce memory footprint and I'm ok with implementing potentially complicated state machine designs for managing the XML data
Partial loading (loading on demand) needed to keep application responsive
=>
Use SAX
5. Same as step 2 but I've no compelling reason to use SAX. Is there something a bit easier?
=>
Use StAX (Streaming API)
Generate > Write
--------------------
1. The contents of XML are - or can be - represented as objects in my application logic
=> Use JAXB XMLBeans / XStream / JiBX or other Object to XML mapping APIs
2. The contents of XML for whatever reason cannot be represented as objects in my application logic, or are already present in a convenient tree data structure. It's more convenient for my application logic to "know" about the XML.
=> Use DOM
Read > Extract > Modify > Write
-------------------------------------
Overall approach depends on the selected approach for the [Read > Extract] portion here. Use the same decision tree as Read > Extract above.
1. If JAXB was used to read into an object model, then stick with JAXB to write out (marshall).
Sometimes the output XML has to have fancy things or conform to external schema and JAXB marshalling may not be upto the task. Then marshall out to a DOMResult and then use transformation API or DOM's LSSerializer for more control over what gets written out and how.
2. If [Read > Extract] done with DOM, stick with DOM and then use LSSerializer to write.
3. If [Read > Extract] done with SAX or StAX, probably selected data was stored in a non mirroring application object model. So use DOM and that model to now build the output XML.
Read > Modify > Write
--------------------------
Sometimes, all that the app wants to do is take an input XML, treat as XML and modify it, and then write it out as a new XML. If the modifications don't need any complex application logic manipulation or much external data, then use transformation API with an XSLT transformation.
If the modifications need complex application logic interjection, then it's a case of [Read > Extract > Modify > Write] pattern.
Note that there are some "special" but common cases I've not addressed:
- Saving XMLs into database is probably easier with ORM packages capable of doing so, like Hibernate. I think Spring also has such support.
- There are other alternate object models like AXIOM which I've never tried out.
Selecting an XML API is a bit of a pain, and I don't like the fact that there are too many options. Over time, I've built up this kind of a decision tree and I think it's a good idea to keep it updated with as many frameworks as possible and represented in a better visual structure. I've come across some articles that do this but they're never comprehensive, leaving one with the doubt whether there's something even better out there. If you discover some info, please add it. Others are welcome to comment or correct or add their opinions and experiences too.