• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

empty text nodes

 
Ranch Hand
Posts: 3244
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey everyone,
I searched here, I've searched google, I've searched apache and Sun too but cant find a definitive answer.
I have a simple xml document and a simple application I wrote to parse it and just echo it back to the screen. the xml doc has an internal dtd and the root element specifies that the only elements it should have are other named elements (no text elements), each of it child elements contain #PCDATA.
When I parse the document I get all sorts of text nodes that have no content - from what I can see it caused by the whitespace in the document (newlines, spaces, etc) between the elements.
I have tried the DocumentBuilderFactory method setIgnoringElementContentWhitespace(true), I have set the parser as validating, I've tried the normalize() method on the root element too and none of it seems to work.
Can anyone tell me how to stop the parser from returning all of these? Or do we have to test each node as we get to it to make sure it has no data in except whitespace?
thanks
Dave
 
Ranch Hand
Posts: 209
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Usually, I wrote a iteration to delete all empty space nodes.
Or you can translate it to ignore white space. (forgot the tag)
xml --translate--> xml(without whitespace)
Chu
 
Leverager of our synergies
Posts: 10065
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey, wait, what about org.w3c.dom.Node class? :roll: Where did you find it? Just curious.
Generally, the more I read posts in this forum, the more I tend to agree with Michael Ernest's reactionary position, that XML brings more problems than it solves. XML was intended as a simple tool to solve our problems. Now it becomes the major violator of KISS (Keep It Small and Simple) principle.
Look at this program from JDJ that uses 1 (one) RegEx to parse the tag structure similar to XML tree:

http://www.sys-con.com/java/archivesa.cfm?volume=07&issue=01
Using Regular Expressions in J2SE 1.4 Source Code
Essentially, the program is one expression: <(.*)>(.*)</\\1> This is about as complex as it should be.
Now compare it to how XML parsing is done in Java: sophisticated, object-oriented, interface based, multi-layered architecture where one layer of indirectness is placed above another and all one can get out of this monster are exceptions (example1 example2), promptly delivered from one of middle layers.
I ready to agree that if your job is to XMLize a giant company, then this complexity is justified, otherwise... :roll:
What do you think?
[ April 25, 2002: Message edited by: Mapraputa Is ]
 
Dave Vick
Ranch Hand
Posts: 3244
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey, wait, what about org.w3c.dom.Node class? :roll: Where did you find it? Just curious.
I didn't, I was using JBuilder personal to do my developmnet but it used all its own libraries and I couldn't update them or use different ones so I tried to do it from the command line and must have had my class path wrong or something, I just downloaded the new version of Forte to try - the last one was too slow for me and kept locking up my computer. Using Forte now it all seems to work alright.
...that XML brings more problems than it solves. XML was intended as a simple tool to solve our problems. Now it becomes the major violator of KISS (Keep It Small and Simple) principle.
I'm not familiar enough with it yet to make that kind of statement but it certainly isn't as easy to handle as I thought it would be. Or maybe I should say there are more moving parts than I thought.
Thanks for the regex code I'll take a look at it but for now I have to stick with 1.3 code.
Here is what I ended up doing in my switch statement, after I know it is a text node:
Text t = (Text)nd;
if (!t.getNodeValue().trim().equals("")){
// do something with the node
}
But I'd really like to have the parser just ignore them completely. It seems to me that if you've got a rather large document and every newline and space between elements causes a new node to be processed it could add a lot of time and memory requirments to parsing the document.
Thanks
[ April 26, 2002: Message edited by: Dave Vick ]
 
reply
    Bookmark Topic Watch Topic
  • New Topic