• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Parsing large file using DOM

 
Ranch Hand
Posts: 183
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm new to XML so please pardon my open ended questions. I'm using the DOM approach to parse a large XML file (over 15MB) using the following code:


This was working fine for a while, though it did seem to be using a lot of RAM as I had to set the vm heap to 256MB for it to run as I was getting a java.lang.outofMemoryException. I'm now using a different vm and I'm getting this same error even on smaller files that used to work fine.

I'm trying to decide between 2 approaches to resolving this:
1) address the garbage collection to clean up immediately after each top level element has been stored in the database, or
2) switch from the DOM approach to another approach that doesn't parse the entire file first. I'm leaning towards this approach (though I've never used it before and could use some pointers on where to learn about it) as my XML file could get much larger than 15MB.

Which approach would you recommend? If # 2, do you have some suggestions where I could learn this approach quickly (searching on XML yields n! hits...).

ms
 
Ranch Hand
Posts: 192
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
If the size of the xml file that you parse is very huge, the best option to parse that xml is neither through DOM nor through SAX.
The best option is available through SAX Extensions which uses SAX as well as filters.

You can read Brett McLaughlin's xml tip at IBM developerworkshere

Hope it helps..
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Memory used in creating a DOM is much larger than the source document - all the elements get turned into Java objects and of course the text is char Unicode. Frequent GC is not going to help.
SAX style processing is the only feasible way to go.
Bill
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic