• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

StAX: cursor-based parsing

 
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Folks!
I'm trying to use StAX and have couple questions.
What is the benefits from such a stream based parsing, use-cases? All I know now that if the XML is large - I should prefer SAX, but if small - DOM.
It this a technology which allows for example to read xml from Socket's inputstream, and start parsing as it comes (without waiting for it to download 100%)?
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Advantage of stream based - low memory requirement, can start with incomplete stream from - say - a socket.

Disadvantages: tricky to program anything past the simplest data grabber - if you need to manipulate a complex hierarchy use DOM.

DOM - advantages - very useful API can manipulate complex hierarchy.

DOM disadvantages - needs more memory - have to parse the entire document before you can work with it.

Bill
 
Ranch Hand
Posts: 32
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's also worth noting that StAX and SAX work slightly differently; StAX is often referred to as a 'pull' parser whilst SAX is a 'push' parser. What this means is quite simple really, with StAX you request the next element while SAX tells you that it already has an element in hand. To put it differently, StAX tells you what it is about to read from the file whilst SAX tells you what it has read. This leads onto one other advantage of StAX in that it is possible both the read and write the xml markup using streams (read elements from one stream and write then to a second one) and thus to reduce the memory footprint. I have used this technique to modify OOXML spreadsheet files which are often so large they cause performance issues with DOM based parsers or api's like POI.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And following on from that, it's not difficult to generate XML using StAX. That's serialization and not parsing, but it's still worth mentioning. It is possible to do serialization by generating SAX events, but it's rather cumbersome and not nearly as straightforward as the way StAX does it.
 
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If an article helps better, profiting from the chance of more careful wordings and of more space for developing various points.
http://docs.oracle.com/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks guys!
Mark, very explanatory answer. But in case if you were using SAX for your task (2 streams, read from 1'st - write to 2'nd): you need to download whole thing and save to HDD before you can parse it, right?
The difference from DOM that after you saved it to HDD, you don't need to load whole thing to RAM to perform output to outputstream.

Paul Clapham wrote:It is possible to do serialization by generating SAX events, but it's rather cumbersome and not nearly as straightforward as the way StAX does it.


Yes, I've found StAX way of writing XML very convenient (no need to create Transformer handlers and factories to map to OutputStream, instead use javax.xml.stream.XMLStreamWriter), but I've found many similarities between StAX and SAX of writing XMLs.
For now I have a vision that StAX a little bit more complex tool but provides less overhead (memory, CPU time), so it is makes senыe to learn how to use it and stick to StAX for almost every task.
 
Mark Beardsley
Ranch Hand
Posts: 32
1
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No, you should be able to parse the xml markup from the stream as you are reading it. The only example I have experience with would be working with Excel workbooks stored on a server. Connecting a stream to one of them allowed me to parse the xml markup using StAX without any need to effectively copy the file onto my local hard drive. If I wanted to update the file, as was often the case, the process looked like this;

Open a stream onto the source file so that I could parse the xml.
Open another stream onto a local copy so that I can save the modified result.
Read from the source file and save elements to the local copy until I get to the point where the modifcation needs to be made.
Add/change the necessary elements and write to the local copy.
Read any and all remaining elements from the source file and add to the local copy.
Replace the source file with the modified/updated local copy.
Delete the local copy.

Of course, if all you wish to do is to read the contents of the file of xml markup then the process is much simpler and you should not need to make a local copy before parsing it.
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Mark, very nice use case of the StAX.
So do you agree that StAX is all-sufficient, and if one know well how to use it, he/she don't need to even know DOM and StAX? Sometimes it's just easier to use for example DOM, but if you know StAX you can easily swap DOM with StAX, and use StAX for the task.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

surlac surlacovich wrote:
So do you agree that StAX is all-sufficient, and if one know well how to use it, he/she don't need to even know DOM and StAX? Sometimes it's just easier to use for example DOM, but if you know StAX you can easily swap DOM with StAX, and use StAX for the task.



Not really true, any manipulations that involve more than one Element - such as changing the hierarchy or manipulating data from early in the document depending on later elements of the document would be outrageously difficult with only StAX. Those DOM manipulation methods are so powerful.

Bill
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot, William!
So the algorithm should be like:

 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Nope: The algorithm is:

1. Do I just need to pull some data items or change individual data items without messing with the XML hierarchy - StaX fine.
2. Do I need to do serious hierarchy related data manipulation - XML is not huge - DOM rules.
3. I need to to serious hierarchy related data manipulation - XML is huge - time for some serious thinking about how to simplify the job with multiple passes or getting really really deep in custom StaX programming.

Bill
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I frequently find myself with an XML document from which I need to extract a small amount of data. When this happens, XPath is often a useful tool to describe and locate the nodes I need. And since DOM supports XPath naturally and StAX doesn't, I use DOM. It's all about the time to create a working program, rather than any need to conserve memory.
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

William Brogden wrote:
3. I need to to serious hierarchy related data manipulation - XML is huge - time for some serious thinking about how to simplify the job with multiple passes or getting really really deep in custom StaX programming.


Thanks Bill. I'm just not sure what you mean about multiple passes, could you please tell a little bit more about it (link to an article will work too)?

Paul, thanks for your input. So I've found out that XPath is one of XSL dialects, and it searches the nodes back and forth, thus pull/push algorithms doesn't make sense for XPath. I believe same functions provided by XPath are availiable with StAX/SAX but will take far more lines of code.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Thanks Bill. I'm just not sure what you mean about multiple passes, could you please tell a little bit more about it (link to an article will work too)?



I suspect that so far in your exploration of XML you have only seen simple documents. XML documents can get really really weird, especially if they have evolved over considerable time.

I had to work with a client whose mock exam input XML had multiple types of data all related to authoring and presentation of a single certification exam simulator. This document even included chunks of CDATA which were in fact valid XML documents, sigh....

The fact that so many kinds of data could be kept in a single document is one of the strengths of XML - but may require a bit of programming.

In this case, in order to create .PDF formatted sets of questions I had to take the DOM of the big document and extract selected bits to make a temporary XML document suitable for turning into PDF.

Thats what I mean about multiple passes.

Bill
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sounds like every where in programming - no silver bullet, right tool for right task.
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can memory-mapped file be employed to make sure every edit of XML via DOM be guaranteed saved to HDD?
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

surlac surlacovich wrote:Can memory-mapped file be employed to make sure every edit of XML via DOM be guaranteed saved to HDD?



No - just think about it for a minute.

The text length of XML elements gets changed by almost every operation. The entire DOM must be serialized to either rewrite over the file or write a new one.

Bill
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

William Brogden wrote:
The text length of XML elements gets changed by almost every operation.


Even searching of element involves DOM modification?
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

surlac surlacovich wrote:

William Brogden wrote:
The text length of XML elements gets changed by almost every operation.


Even searching of element involves DOM modification?



No, but your question used the words "make sure every edit"
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Text editors (e.g. Notepad, MS Word and so on) don't save every edit to disk immediately anyway, and nobody seems to mind that. So requiring an XML editor to save every edit to disk wouldn't really be reasonable.
 
surlac surlacovich
Ranch Hand
Posts: 296
Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks you very much, Folks!
 
reply
    Bookmark Topic Watch Topic
  • New Topic