• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Holy Batman Bloatware!

 
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As many of you are no doubt aware, when using Xerces (or XML4J) parsing a simple 1Mb XML document into a DOM requires roughly 6 Terabytes of RAM.
Ok, maybe I'm exaggerating a little. But the fact remains, the IBM implementation of the DOM uses insane amounts of RAM.
Does anyone know of a DOM implementation that pages portions of itself in and out of memory, as required?
Rob

 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That (the paging DOM) sounds like an interesting project!
It is not surprising how much memory the DOM uses, given the enormous number of objects even a relatively simple tag can create, but that is all required by the w3c.dom.
The JDOM parser is supposed to use considerably less memory.
Monster XML files are best tackled with an SAX parser.
 
Rob Ward
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I did think about implementing the paging DOM thing, but I lack the time I would need to devote to do it properly. It would be something that would be an excellent addition to the Xerces project though.
Yes, I agree SAX is the only option with large XML documents. The difficulty is when you want to traverse child nodes in it, access specific nodes, and other DOM-like things. It can be done, it's just more work than using the DOM.
Cheers,
Rob
 
Sheriff
Posts: 7001
6
Eclipse IDE Python C++ Debian Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just because DOM is so heavyweight doesn't mean you have to go for a solely SAX-like streaming approach. I have used a "minimal object model" approach quite successfully in the past, where I parsed all the tags using SAX or some other streaming parser into a hierarchical object structure containing only the information and relationships I was likely to need.
Works really well, and doesn't gobble up memory.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic