• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

Xalan memory problem - DOMSource vs. StreamSource

 
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm hoping someone can help me out... we're working on a project to run a series of very simple stylesheets on some moderate sized XML files (couple megs up to about 10 megs) to extract a few lines of data. We'd hoped to parse the document once initially, then pass a DOM reference to each Transformer. But, we've run into some problems.
If we do Transformer.transform(StreamSource, StreamResult), and just provide the XML file as the first param, things go fairly smoothly. But, if we get a DOM reference to the file beforehand and just pass a DOMSource to the Transformer, the app uses about 200+ megs of memory (vs. less than 50 normally on a <10 meg file) and takes considerably longer to complete.
Can someone key me in to what we're doing wrong? Shouldn't using a DOMSource be faster, if anything, than using a StreamSource? Is the latter using SAX or something like that?
Thanks very much,
-tim
 
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Tim,
Using the stream source just directs the XML stream to the stylesheet.
Using the DOM source, however, forces TRAX to build up a tree of all the contents of your XML file in memory. That is not necessary when the next step is to perform a XSLT transform.
Actually, XSLT is based on events like SAX. When a node is read from the input, the template that matches the node is executed.
As a rule of thumb, most people consider that the SAX API is faster than the DOM one on large files (>500K). This is because DOM builds an in-memory tree of all the data contained in the XML file (the org.w3c.Document), where with SAX, only you decide what should be kept in memory.
Cheers
 
Bob Dobalina
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Beno�t, I was thinking it would be something like that, but I wasn't sure about the internals of an XSLT processor. For the conversion engine we're working on, we'd planned on passing a DOM reference around to the various stylesheets in the conversion, but it seems like a better solution at this point would be to just pass streams around.
Thanks,
-tim
 
What a show! What atmosphere! What fun! What a tiny ad!
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic