• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Paul Clapham
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Roland Mueller
  • Piet Souris
Bartenders:

Parsing and Storing HUGE XML files

 
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a huge XML file containing say millions of records. I get this file on daliy basis from the customer. My requirement is to parse the file and store all the records in it in database. Because of the size of the file, it can run into memory issues. Is there any other way to parse the file in chuncks, store in DB w/o running into memory problems. I believe any XML technique like XPopinter, XQuery or XPATH will hold it in DOM and that will be a problem.

Please let me know if anyone has had such implementation done in his work.

Thanks
Vicky
 
Marshal
Posts: 28425
102
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Parsing with SAX or STAX should do what you want.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I believe any XML technique like XPopinter, XQuery or XPATH will hold it in DOM and that will be a problem.



You are exactly right.

This may be a job for "pipeline" style processing. I did survey article 1 and article 2 on XML pipelines.

I strongly recommend Harold's online book chapter on SAX processing.

Bill

[ October 08, 2008: Message edited by: William Brogden ]
[ October 08, 2008: Message edited by: William Brogden ]
 
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Like Paul Clapham said, you want some SAX or STAX, or any XML parsing library which is event based(xpp3, etc.). Trying to load the document inside a tree based XML API will probably give you a outofmemoryerror, you'll try playing with the heap size and get nowhere...
It will be less convenient/easy depending on the XML document structure and complexity, but at least you'll be able to process the file.
 
Ranch Hand
Posts: 315
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have also same issue to address. But few extra things too.

1.) xsd validation.- if the xml formation aginst xsd is incorrect. I need to save the reason for it and show it to the user.

2.) perform user validations if xml is fine and save the records into DB.

I was thinking of using Castor api.

But my concern is of mapping around 150 fields to java classes and then saving into DB.

Should I map java classe to xml fields using castor and then do validations etc.

or

should I use sax parser to parse the file and populate the fields one by one.

Please suggest some inputs.

Thanks,
Neeraj.
 
reply
    Bookmark Topic Watch Topic
  • New Topic