• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

Which XML technique to use?

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am starting a new project and I need to take some input in the form of XML. Now being an absolute greenhorn and with the plethora of technologies available in Java for parsing XML, I am a little lost at sea.
So what do you experts suggest for a small project that needs XML parsing (and validating of course). I have checked xerces (SAX only), JDOM and JAXB. What would be the easiest and most current way of parsing XML?

Thanks.
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I use this decision tree when selecting an approach:

XML usage falls into one of these patterns:
- Read > Extract
- Generate > Write
- Read > Extract > Modify > Write
- Read > Modify > Write

Read > Extract
------------------
Q1: Should my application logic manipulate the XML as XML itself, or convert them to a mirroring object model? What would be the maintenance costs?
[By "mirroring" object model, I mean that the XML elements and attributes have exact mappings to the object model, using something like JAXB / JiBX / XMLBeans / XStream..]
Answer "No"
- if XML schema is not available at all
- if XML is not well formed
- if XML schema is liable to changes or not under your control. Frequency of changes is a factor - high frequency means your object model also has to change.
- if representing XML contents as object model does not fit in very well (does not "feel right") with the rest of the logic

Answer "Yes"
- if XML is well formed + representing XML contents as object model fits in very well with the rest of the logic

Q2: It's a Yes to Q1, but the XML size is huge (100s of MBs to GBs) and I'm afraid of memory footprint. Should I still proceed with object mapping?
Answer:
Option 1: Though size may be huge, does app really need all the data to be read upfront?
If no - if the app has wriggle room to load and use partial chunks at a time - then using SAXSource or StAXSource, along with the JAXB Unmarshaller Listener concept, is an effective way to keep both memory footprint and app response times low.
It's basically load-on-demand.
This concept is useful, for example, if app is receiving XML over a HTTP channel (maybe from a RESTful service) and wants to use and show atleast partial data to user, instead of waiting for all the data to arrive.
I don't know whether other mapping APIs support something similar.

Option 2: Overall size may be huge, but maybe app only needs a portion of data and prefers that portion in an object model.
If so, reduce the original XML to the required portion using Transformation API with an XSLT that has the transformation rules (XSLT uses XPath expressions). But be cautious. If the original XML schema is not in my control, I will have to bear the maintenance cost of changing XSLT to keep with the schema changes.

Q3: It's a No to Q1...can't use - or don't want to use - object mapping for the reading part. Now what?
Answer:
1. XMLs are "small" (ie, even when fully loaded, you're ok with the memory footprint of the overall app - I keep 100 MB XML file size as my threshold for "small" on a desktop app).
XMLs are guaranteed to be well formed.
Want to concentrate on core app functionality more
Convenience and simplicity of XML reading more important than performance (response time) or memory.
=>
Use DOM (either the inbuilt DOM API which is Xerces based) or any of the other DOM APIs like dom4j or JDOM.

2. Same as step 1
+
Would also like to reduce those multi-nested loops to navigate my XML hierarchy and make maintenance easier
=> Use XPath API with DOM

3. XMLs may not be well formed
=> Use SAX or StAX (note: I'm not sure whether other DOM or DOM like APIs are more lenient on this aspect)

4. XMLs are huge (100s of MBs to GBs)
Memory footprint is likely to be too much
Have to reduce memory footprint and I'm ok with implementing potentially complicated state machine designs for managing the XML data
Partial loading (loading on demand) needed to keep application responsive
=>
Use SAX

5. Same as step 2 but I've no compelling reason to use SAX. Is there something a bit easier?
=>
Use StAX (Streaming API)

Generate > Write
--------------------
1. The contents of XML are - or can be - represented as objects in my application logic
=> Use JAXB XMLBeans / XStream / JiBX or other Object to XML mapping APIs

2. The contents of XML for whatever reason cannot be represented as objects in my application logic, or are already present in a convenient tree data structure. It's more convenient for my application logic to "know" about the XML.
=> Use DOM


Read > Extract > Modify > Write
-------------------------------------
Overall approach depends on the selected approach for the [Read > Extract] portion here. Use the same decision tree as Read > Extract above.

1. If JAXB was used to read into an object model, then stick with JAXB to write out (marshall).
Sometimes the output XML has to have fancy things or conform to external schema and JAXB marshalling may not be upto the task. Then marshall out to a DOMResult and then use transformation API or DOM's LSSerializer for more control over what gets written out and how.

2. If [Read > Extract] done with DOM, stick with DOM and then use LSSerializer to write.

3. If [Read > Extract] done with SAX or StAX, probably selected data was stored in a non mirroring application object model. So use DOM and that model to now build the output XML.

Read > Modify > Write
--------------------------
Sometimes, all that the app wants to do is take an input XML, treat as XML and modify it, and then write it out as a new XML. If the modifications don't need any complex application logic manipulation or much external data, then use transformation API with an XSLT transformation.
If the modifications need complex application logic interjection, then it's a case of [Read > Extract > Modify > Write] pattern.

Note that there are some "special" but common cases I've not addressed:
- Saving XMLs into database is probably easier with ORM packages capable of doing so, like Hibernate. I think Spring also has such support.
- There are other alternate object models like AXIOM which I've never tried out.

Selecting an XML API is a bit of a pain, and I don't like the fact that there are too many options. Over time, I've built up this kind of a decision tree and I think it's a good idea to keep it updated with as many frameworks as possible and represented in a better visual structure. I've come across some articles that do this but they're never comprehensive, leaving one with the doubt whether there's something even better out there. If you discover some info, please add it. Others are welcome to comment or correct or add their opinions and experiences too.
 
Ranjan Sinha
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Wow. Thanks for the detailed answer.
I am not expecting the XML file to be more than 200 KB in normal use cases. Only under extreme conditions should it go beyond that. Further I would be doing Read -> Extract operation only and the XML would be created external to my application. However, I would be creating a schema and input XML must conform to this schema.

I have not yet decided on the object mapping though.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic