Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

xml parser design question

 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have a requirement where i have to parse a ~125000 lines containing XML (DB extracted file) into a another text file (precisely a .ntriples file). When I parse the xml, i have to take the node names, attribute names, attribute values, CDATA content and translate them to some meaningful URI's and write them onto the text file. Consider below sample
<Students>
<student name="john" age="22" subject="geography">John is good singer</student>
<student name="jai" age"22" subject="java">Jai is a good dancer</student>
</Students>

.... similarly many number of different nodes and attributes... Now i have to parse this and write into a text file like below -
<http://www.coderanch.com/student> <www.xmlschema#typeOf> <http://www.coderanch.com/Students>.
<http://www.coderanch.com/student> <www.xmlschema#name> "John".
<http://www.coderanch.com/student> <www.xmlschema#age> "22".
<http://www.coderanch.com/student> <www.xmlschema#subject> "geography".
<http://www.coderanch.com/student> <www.xmlschema#generaldescription> "John is good singer".

.... similarly this .ntriples file will contain all the information from the xml parsed like above.

My Questions ->
1. Which parser should I use - a DOM or SAX? I have written one or two of both and i think if there are 10000 nodes and if I iterate through a node list using DOM then it will take much time and difficult to code since many CDATA segments are also present in the xml. To add the application need not be super fast as it will be run only as a Batch.

2. How to do the comparison work swiftly. Say i hit a node <student> and now I should know that the corresponding URI of <student> node is <http://www.coderanch.com/student>. There can be around ~100 such URI mappings for nodes & attributes. What should I use - either load the node-URI mapping using Java.util.Properties or keep it as constants file.

3. What FileWriter should I use. The ntriples file need not be encoded.

Thanks,
John
 
Paul Clapham
Sheriff
Posts: 21416
33
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since the task appears to be to transform an XML document into a text document, my first instinct would be to write an XSL transformation. I might think twice about that if the business logic turned out to be complex, but I do try to avoid writing transformations with low-level tools like SAX or DOM.

(And by the way using an XML parser is not the same as writing an XML parser.)
 
Wim Vanni
Ranch Hand
Posts: 96
Eclipse IDE Java Oracle
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think there are many arguments in favor of one or the other (see for example here) but I do agree that an XSLT transformation seems a logical solution.

Cheers,
Wim
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Paul & Wanni for your replies
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic