• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Liutauras Vilda
Sheriffs:
  • Rob Spoor
  • Bear Bibeault
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh

XML parser fails on XML files with tabs

 
Ranch Hand
Posts: 133
1
Mac OS X
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here's a weird one: I've been using an XML parser with Java for more than 15 years. XML parsers are a bit messy, but they work -- until now. I have a long XML file, but managed to trim the problem down to three lines. These three lines are parsed correctly:



But these three lines produce screwy results:



Now, to make this even more screwball, the insertion of tabs is made by the method that saves the XML file -- which uses the same java xml libraries! In other words, the java xml libraries write files that they cannot read!

The two java XML libraries that I am using are java.xml and java.xml.crypto from JDK 15.0

This is just too crazy. Somehow, somewhere, I must be doing something wrong. But this code has worked in the past. My best guess now is that it is somehow arising from the latest JDK and JVM. Perhaps they're mismatched. But I never had this problem before I updated to the latest version of Eclipse and the latest JVM.

Do you think I need to take some sort of psychoactive drug to comprehend the problem? 😄
 
Saloon Keeper
Posts: 23729
161
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'd say it's not Java, and it's not the parser, it's the way you use the parser.

Strictly speaking, XML is not "pretty-printed".  That is, the only time you should be seeing end-of-line, tab, space, or other such characters in an XML stream is if they are an actual working part of an element body. And, in fact, that's how a parser like SAX would typically be expected to take all the characters (whether visible or not) that are between elements - including between sub-elements! and aggregate them into a single body text. Except that sane XML usually doesn't recognize such body aggregates and thus the application would discard them. An exception would be in somethine done a lĂ  HTML where you might have sub-elements in something like a "p" element for boldfacing, underlining and/or italicizing the body text. Which is a whole different can of worms.

So your problem boils down to the fact that whoever's reading the XML didn't allow for tab characters in the "noise" areas.
 
Chris Crawford
Ranch Hand
Posts: 133
1
Mac OS X
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks much for the quick answer!

My problem here is that the method that SAVES the XML file is pretty-printing it, but the method that LOADS the file, of course, rejects the pretty-printed XML. So I start with a proper XML file, it loads correctly, I make a tiny edit, then save the XML file. Examining it in a text editor, I see that it is pretty-printed. When I then try to load the XML file into my program, the program of course chokes on the pretty-printed file.

By the way, I do have this line of code inserted into both the load and save methods:

factory.setIgnoringElementContentWhitespace(true);

So, should I work on the save method, trying to get it to NOT pretty-print?
Or should I work on the load method, trying to get it to work with pretty-printed XML files?
 
Marshal
Posts: 26617
81
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chris Crawford wrote:Do you think I need to take some sort of psychoactive drug to comprehend the problem? 😄



That could be helpful. Even a couple of beers might go a long way.

But seriously -- I just installed the latest version of Eclipse and applied the Java 15 patch. And then I noticed that part of the application which I wrote over the last dozen years was failing silently. It was using XML serialization to produce XML from data stored in the application, and an error message which I couldn't really understand was happening and being ignored. The error message was coming from Saxon, which I had included a very long time ago when XSLT wasn't well-supported in Java. Not that XSLT was being used in this situation, but Saxon was doing the XML serialization anyway. So I removed Saxon from the build path and the problem stopped happening.

So then I looked at the data for that part of the application and found that it hadn't been updated correctly since June. I had updated Eclipse in March after several years of using Eclipse Neon, also upgrading from Java 8 to Java 13, and then I upgraded to Java 14 in June when the new version of Eclipse supported it. At least that's my approximate recollection of the process, I didn't keep a log of what I did. (The lost data, it wasn't permanently lost because it came from an external source where I could go and re-fetch it.)

So yeah, it's not impossible that funny stuff happens with XML in recent Java versions. In your case I would suspect javax.xml.crypto as most likely to have been changed.
 
Tim Holloway
Saloon Keeper
Posts: 23729
161
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Java XML is designed for plug-replaceable modules. If you're getting whitespace output from a document save operation, either you didn't switch on the option soon enough of the document saver may be defective. In which case you might be able to plug in an alternative DocumentBuilder (usually this is a "-D" JVM environment definition on the application's command line.)

Similarly, you should be able to swap out the SAX parser that's used to digest the incoming XML.

As far as proper usage in a bug-free situation, saving without whitespace is more efficient for XML storage and data transmission. The spacing is to make the document more human-readable, and if humans aren't usually going to read the XML, it's just a waste. And if you have the occasional need to read "ugly" XML in an IDE like Eclipse, you can always open it with an editor and use the "Source Format" option to make the editor pretty-print it.
 
Paul Clapham
Marshal
Posts: 26617
81
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:If you're getting whitespace output from a document save operation, either you didn't switch on the option soon enough of the document saver may be defective.



I would expect that "ignore element content whitespace" would only apply to the parser reading elements which consist entirely of whitespace. I would be surprised if it prevented the serializer from writing elements consisting only of whitespace.
 
Chris Crawford
Ranch Hand
Posts: 133
1
Mac OS X
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For the time being I have a workaround: I drop the newly-written XML file into my text editor and erase all the groups of whitespaces, then save that copy. Yes, it's a hack, but it's easier than fighting Java, which is like fighting an octopus with one hand tied behind your back. I'll keep fooling around. My next exploration will be of older programs that used XML files.
 
Tim Holloway
Saloon Keeper
Posts: 23729
161
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:

Tim Holloway wrote:If you're getting whitespace output from a document save operation, either you didn't switch on the option soon enough of the document saver may be defective.



I would expect that "ignore element content whitespace" would only apply to the parser reading elements which consist entirely of whitespace. I would be surprised if it prevented the serializer from writing elements consisting only of whitespace.



Yeah. I double-checked the docs an it does include the key phrase "when parsing".
 
He got surgery to replace his foot with a pig. He said it was because of this tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic