• Post Reply Bookmark Topic Watch Topic
  • New Topic

XML parsing examples and/or tutorials  RSS feed

 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm trying to learn Java, XML and regex at the same time. I need this to create a program that will parse a mixed content XML file and produce another. The input files are big documents and the output files are the same documents with rewritten markup based on textual content. My biggest challenge seems to be to identify textual patterns that may cross node boundarys. A node could be an element, a comment, a processing instruction and so on. Of course elements are not created equal and most can have attributes that I need to retain and possibly add to newly created elements.

Most likely I will use DOM since I need to do some look-ahead and perhaps also look-behind to recognize patterns and where they start and end. DOM also seems to be a good choice with mixed content (an element can contain text and child elements in any order and recursively). Feel free to try to convince me there is a better alternative to DOM!

I have also looked at XPath. I can see that it is powerful but I don't see how it could help me.

I have found some examples and a little bit of tutorial information but most tackle rather simple problems. What I would like to get is pointers to XML parsing and construction examples that could give me more ideas and inspiration to learn good techniques for handling semi-complex cases.
 
akhter wahab
Ranch Hand
Posts: 151
Java MyEclipse IDE Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
html unit helps you allot this might help you alot webpage
 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks!

A parser for HTML probably has many of the same challenges as a parser for XML. I will see if I can find some inspiration in the source code (which at first sight looks enormous).

Other pointers are still welcome of course!
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!