• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

DOM String output is not the same as original XML

 
Ranch Hand
Posts: 210
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to extract an element (and all its children) from a given XML file as string. However, I want to retain its current infoset. Bassically, I want the parser to leave alone the XML I'm extracting.

So, to do this I build a DOM document from a given XML string, I search for the desired element, and next I output that element via a Transformer to a String.

The problem is that the orignal XML is not the same any more as the outputed XML

The source XML might be like this:



I extract the 'person' element and write it back to String, then it looks like this:



Now, I now that the input XML contains redundant namespacing, and that the output XML is better. But in this case I want to output XML exactly to be as the input XML:

- Whitespaces/tabs/linefeeds/cariaged/what ever must be retained
- No namespace optimalisation whatsoever

So, what I'm looking for a 'substring()' on the original XML. The problem is that real substring is not that simple on an XML and probably the least prefered/clean solution.

I tried the 'Transform' class from xalan, you can configure a lot there. I managed to configure it so it leaves the indenting alone. But it still does namespace optimalisation and also removes linefeeds between namespace declarations. If I would have

<someElement xmlns:ns1="test1"
xmlns:ns2="test2"</someElement>

Then the output looks like:

<someElement xmlns:ns1="test1" xmlns:ns2="test2"</someElement>


Any advise is welcome !
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Parsers and Transformers are concerned with creating legal DOMs and writing legal XML documents, not with conserving exact formats. For example there is no guarantee that the order of attributes will be preserved and of course there is the well known example that there are two legal ways to write an empty element.

Thats just a fact of life. If you want a particular exact format you are probably going to have to create it yourself.

Bill
 
author
Posts: 23893
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

So, what I'm looking for a 'substring()' on the original XML. The problem is that real substring is not that simple on an XML and probably the least prefered/clean solution.



If all you want is to extract substrings, and the fields that you are trying to extract are simple (meaning a field doesn't nest another field with the same element name), you can use regular expressions to extract the substring. Try...



Henry
 
Hoo hoo hoo! Looks like we got a live one! Here, wave this tiny ad at it:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic