Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Slicing XML document

 
Tom Stevns
Ranch Hand
Posts: 122
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello !

I have a XML document containing 200.000 elements.
Those elements should be placed respectively into 200.000 new XML files.

I would appreciate to hear from You about this problem

NB! No validation is needed.

The input should be streamed

- So far my own suggestion concerns about SAX2 - DOM2 - XPATH - XSLT or even just using som of the Java STRING metods.

Thanks in advance
 
Paul Clapham
Sheriff
Posts: 21416
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You are just looking for suggestions, correct? Then I would suggest using SAX for the input.
 
John Simpson
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JaxB is what I am using, it's fairly straightforward... just a suggestion.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13074
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sounds like basic SAX processing to me with a whole #@load of FileWriter creations.

Where in the world are these 200,000 files going? I would worry that having that many files in one directory would surely strain any operating system. If the files are just an intermediate step and get consumed by another process there may be a lot easier way to get the job done.

Bill
 
Rahul Bhattacharjee
Ranch Hand
Posts: 2308
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Paul Clapham:
You are just looking for suggestions, correct? Then I would suggest using SAX for the input.


+1
 
Tom Stevns
Ranch Hand
Posts: 122
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello

Thank You for the suggestions.

Regarding: Where in the world are these 200,000 files going?

William Brogden - These files are putted to a MQ queue i slices because
our MQ-system i not allowed having messages in the size about half a Giga-byte

I have done it with SAX and "Streaming a large HTTP file" about four years ago, but this time i has to be in a more "generic"(I hate that word) fashion

Have a nice day or evening to all of You
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13074
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
William Brogden - These files are putted to a MQ queue i slices because
our MQ-system i not allowed having messages in the size about half a Giga-byte


So you are not really writing files to disk but creating messages and adding them to a MQ server. Therefore everything depends on what your MQ messages have to be constructed from.

The simplest (assuming you want to use XML parsing) would be from a String so for each element that gets turned into a separate document you would create a StringWriter when the startElement event gets called, then write to it as the events for the contained tags occur, and finally create the MQ message when the endElement event occurs.

Even simpler would be to read the file as text and locate the start and end elements by literally doing String operations line by line but then you lose the parser error checking.

In any case I bet the limiting factor will be the speed at which the MQ server accepts messages.

Bill
 
Tom Stevns
Ranch Hand
Posts: 122
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello William !

--------------------------------------------------------------------------------
In any case I bet the limiting factor will be the speed at which the MQ server accepts messages.
--------------------------------------------------------------------------------

Tomorrow I will give You an answer because I've already created and Integration-tested that part. I just have to make a test file which generate and write 2*10E5 dummy-messages to an input queue.

About the implementation: The only thing I can be sure about is that the
XML is in a proper format. Therefore it is too risky to let it be line dependent.

I concider a XMLreader and XMLwriter combined with an iterator will do
the job.

Anyway I just have to try - even the ultimate code with less than 15 line
would be nice. There must an XML API having a String like method that simply grabs the whole content of an XML-element including the child elements. ;)
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13074
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There must an XML API having a String like method that simply grabs the whole content of an XML-element including the child elements. ;)


You might find a "pipeline" style processing toolkit that could do the job, see my summary article on pipeline toolkits - the ServingXML toolkit looks like your best bet.

Bill
 
Tom Stevns
Ranch Hand
Posts: 122
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank You very much William !

I look forward to read about it tomorrow.

Sweat Dreams |o)
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic