Help coderanch get a
new server
by contributing to the fundraiser
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:
  • Lou Hamers
  • Piet Souris
  • Frits Walraven

xml parser design question

 
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I have a requirement where i have to parse a ~125000 lines containing XML (DB extracted file) into a another text file (precisely a .ntriples file). When I parse the xml, i have to take the node names, attribute names, attribute values, CDATA content and translate them to some meaningful URI's and write them onto the text file. Consider below sample

<Students>
<student name="john" age="22" subject="geography">John is good singer</student>
<student name="jai" age"22" subject="java">Jai is a good dancer</student>
</Students>


.... similarly many number of different nodes and attributes... Now i have to parse this and write into a text file like below -

<https://coderanch.com/student>; <www.xmlschema#typeOf> <https://coderanch.com/Students>.
<https://coderanch.com/student>; <www.xmlschema#name> "John".
<https://coderanch.com/student>; <www.xmlschema#age> "22".
<https://coderanch.com/student>; <www.xmlschema#subject> "geography".
<https://coderanch.com/student>; <www.xmlschema#generaldescription> "John is good singer".


.... similarly this .ntriples file will contain all the information from the xml parsed like above.

My Questions ->
1. Which parser should I use - a DOM or SAX? I have written one or two of both and i think if there are 10000 nodes and if I iterate through a node list using DOM then it will take much time and difficult to code since many CDATA segments are also present in the xml. To add the application need not be super fast as it will be run only as a Batch.

2. How to do the comparison work swiftly. Say i hit a node <student> and now I should know that the corresponding URI of <student> node is <https://coderanch.com/student>. There can be around ~100 such URI mappings for nodes & attributes. What should I use - either load the node-URI mapping using Java.util.Properties or keep it as constants file.

3. What FileWriter should I use. The ntriples file need not be encoded.

Thanks,
John
 
Marshal
Posts: 28288
95
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since the task appears to be to transform an XML document into a text document, my first instinct would be to write an XSL transformation. I might think twice about that if the business logic turned out to be complex, but I do try to avoid writing transformations with low-level tools like SAX or DOM.

(And by the way using an XML parser is not the same as writing an XML parser.)
 
Ranch Hand
Posts: 96
Eclipse IDE Oracle Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think there are many arguments in favor of one or the other (see for example here) but I do agree that an XSLT transformation seems a logical solution.

Cheers,
Wim
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Paul & Wanni for your replies
 
reply
    Bookmark Topic Watch Topic
  • New Topic