• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Large XML

 
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I am trying to process large XML(50MB,sometimes 75 MB) with deep nested nodes.As a result parsing (which i am doing it in Tibco Business Works) taking long time(more than 5 min).tibco uses DOM parser internally to parse the document.
What are different approach we can take to optimize? Converting file data to byte array and then processing it? or changing the parser by writing the code in Java?
 
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
75MB is actually very small. 3-4 GB would be considered large. Anyway, you could process the data using a SAX-based application.

Take note that DOM and SAX are API, they are not parsers. So, its not about "changing the parser." The parser most likely stay the same. What will change is the application that is processing the data immediately after parsing occurs.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What exactly are you trying to do with this file?

1. Extract only a few data items?
2. Perform complex queries?
3. Write a modified XML document?
etc etc

Bill
 
Arjun Shastry
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi,
Extract all sub items, transform each of them to flat file format, consolidate entire data and write to flat file.This is what i m planning to do.Transforming and writing to flat file will be done by Tibco(BW) tool.So entire process is to
1)Parse the XML.
2)Transform each subitem to flat file format.
3)Write enfire subitems data to flat file.
2 and 3 too are taking much time.But that can not be modified.So i m thinking on parsing side, if its possible to reduce the time.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Short answer - probably not.

Since parsing time is the slowest step in practically every application treating XML this way, you may be very sure that a LOT of thought has gone into optimizing parsers.

Turning off any validation may help.

Do you have any control over the way the XML is produced?

Bill
 
Arjun Shastry
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No.We don't have control on how the XML is produced.One possibility is to replace Tibco BW by java component.
 
Ranch Hand
Posts: 643
Android Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Then you have to replace the BW with Java code for doing the same.
That is the only solution.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I suggest you take a look at the ServingXML toolkit. This is a open source "pipeline" style processor which has been around for a while. The page I cited leads to extensive examples of conversions such as you need.

Bill
 
Jimmy Clark
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

1)Parse the XML.
2)Transform each subitem to flat file format.
3)Write enfire subitems data to flat file



Arjun, it would be beneficial to think of these as a single process. The creation of the files will be part of the parsing process. In other words as the parser is reading the XML-based data, it is writing to the files. When the parsing is completed the files have been created. A well-written SAX-based application should be able to process 75 MB in less than 60 seconds.

If you need some help with writing the SAX application, check out the following web page for more information. Good luck!

http://www.retrievalsystems.com/
 
Arjun Shastry
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks all for the help.I will definitely look into above things.
 
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Did you consider XSLT as an alternative to Java for this application?
 
What could go wrong in a swell place like "The Evil Eye"? Or with this tiny ad?
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic