Win a copy of Five Lines of Code this week in the OO, Patterns, UML and Refactoring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

How to parse XML document with default namespace with JDOM XPath

 
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows:


This program would work on the same document without the default namespace, hence, it would not be necessary to include �ns� prefix along in the XPath statements (line 6-7) either. Moreover, I was using �org.apache.xerces.parsers.SAXParser� to have successfully retrieve content of <a> from the same document without default namespace in the past.

I would like to achieve the following objectives if possible:

( i ) Exclude DTD and namespace in order to simplifying the parsing process. How this could be done?
( ii ) If this is not possible, how to include it in XPath statements (line 6-7) so that the value of <a> is picked up correctly?
( iii ) Would changing from �org.apache.xerces.parsers.SAXParser� to �org.ccil.cowan.tagsoup.Parser� make any difference as far as using XPath is concerned?
( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for reference?

I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2 on Windows XP platform.

I have also posted this question at http://forums.sun.com/thread.jspa?threadID=5344947&tstart=0

Any assistance would be appreciated.

Thanks in advance,

Jack
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I can confirm that the XPath using Saxon parser ("org.ccil.cowan. tagsoup.Parser" ) is working with default namespace.. I made the mistake of assuming that the XML document converted by TagSoup was identical to using light_html2xml in the past.

Consequently, what is outstanding still, even though not critical, but nice to have, is ( i ) to exclude DTD from XML file. If this is not possible, ( iv ) to setup local SYSTEM EntityResolver in this JDOM environment.

Below is an example of what I am trying to achieve in ( iv ) in a DOM environment:


Would anyone be able to give me some idea on how to do this?

Thanks,

Jack
 
Ranch Hand
Posts: 225
Eclipse IDE Debian Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It looks to me like org.jdom.input.SAXBuilder has the setValidation() and setEntityResolver() methods you need. Is there some reason these don't work?
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Carey,

Thanks for responding to this question.

Below is where the Sax parse is defined:

line 1. SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", false);
line 2. saxBuilder.setValidation(false);
line 3. saxBuilder.setEntityResolver(???);

( a )Are you referring to the boolean parameter in line 1 and 2? Are they both equivalent? This setting appears to be working as no Internet online connection is needed to parse the XML file. However, I am wondering whether it is possible to exclude the DOCTYPE in the converted XML document altogether during parsing/conversion. Otherwise, how to possibly use line 3 set a local SYSTEM DTD? I am looking for something like setEntityResolver(false) so that I could open it up without it referencing the PUBLIC DTD.
( b ) I would also like to exclude the namespace from being included during parsing/conversion in order to simplify my XPath searches?

Thanks again,

Jack
 
Carey Evans
Ranch Hand
Posts: 225
Eclipse IDE Debian Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
TagSoup lets you disable namespaces by setting the standard SAX feature �http://xml.org/sax/features/namespaces� to false. Unfortunately for you, JDOM will turn it back on before parsing. You might need to use a standard DOM instead; Java 5 and Java 6 have built-in XPath support.

You can set the same EntityResolver with saxBuilder.setEntityResolver(...) as in your previous sample using DocumentBuilder, or use the one from the Apache XML Commons Resolver library, to use a local file.
 
Don't get me started about those stupid light bulbs.
    Bookmark Topic Watch Topic
  • New Topic