Win a copy of Learning OpenStack Networking: Build a solid foundation in virtual networking technologies for OpenStack-based clouds this week in the Cloud/Virtualization forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Knute Snortum
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Ganesh Patekar
  • Stephan van Hulst
  • Pete Letkeman
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Ron McLeod
  • Vijitha Kumara

xml parser design question  RSS feed

 
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have a requirement where i have to parse a ~125000 lines containing XML (DB extracted file) into a another text file (precisely a .ntriples file). When I parse the xml, i have to take the node names, attribute names, attribute values, CDATA content and translate them to some meaningful URI's and write them onto the text file. Consider below sample

<Students>
<student name="john" age="22" subject="geography">John is good singer</student>
<student name="jai" age"22" subject="java">Jai is a good dancer</student>
</Students>


.... similarly many number of different nodes and attributes... Now i have to parse this and write into a text file like below -

<http://www.coderanch.com/student> <www.xmlschema#typeOf> <http://www.coderanch.com/Students>.
<http://www.coderanch.com/student> <www.xmlschema#name> "John".
<http://www.coderanch.com/student> <www.xmlschema#age> "22".
<http://www.coderanch.com/student> <www.xmlschema#subject> "geography".
<http://www.coderanch.com/student> <www.xmlschema#generaldescription> "John is good singer".


.... similarly this .ntriples file will contain all the information from the xml parsed like above.

My Questions ->
1. Which parser should I use - a DOM or SAX? I have written one or two of both and i think if there are 10000 nodes and if I iterate through a node list using DOM then it will take much time and difficult to code since many CDATA segments are also present in the xml. To add the application need not be super fast as it will be run only as a Batch.

2. How to do the comparison work swiftly. Say i hit a node <student> and now I should know that the corresponding URI of <student> node is <http://www.coderanch.com/student>. There can be around ~100 such URI mappings for nodes & attributes. What should I use - either load the node-URI mapping using Java.util.Properties or keep it as constants file.

3. What FileWriter should I use. The ntriples file need not be encoded.

Thanks,
John
 
Sheriff
Posts: 23706
50
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since the task appears to be to transform an XML document into a text document, my first instinct would be to write an XSL transformation. I might think twice about that if the business logic turned out to be complex, but I do try to avoid writing transformations with low-level tools like SAX or DOM.

(And by the way using an XML parser is not the same as writing an XML parser.)
 
Ranch Hand
Posts: 96
Eclipse IDE Java Oracle
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think there are many arguments in favor of one or the other (see for example here) but I do agree that an XSLT transformation seems a logical solution.

Cheers,
Wim
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Paul & Wanni for your replies
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!