• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

XML replacing char

 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First of all, i'm new to programming, so don't be to hard on me.
second thing, my english is not very good.

Now let's go to the question.
I have an XML file what looks like:



As you can see <result$> and <test$> are not valid.
What i need is a piece of JAVA code that removes those elements.

Bellow you can find the code that I have yet, but I'm getting a SAXException (caused by the $ sign).
Do you guys have any idea how I can remove the invalid elements from the XML file and create a new valid XML?

Thanks in advance!

 
Sheriff
Posts: 10445
227
IntelliJ IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Frank, welcome to CodeRanch!

Where is that xml content coming from into the file? Whatever is writing out that content would ideally have to fix it.

If that's not possible, then in your code where you trying to fix it, instead of reading it as XML, I would suggest that you read it as a plain file (using the File APIs) and do a simple replace on that particular element name (using the String APIs).

Furthermore, I think you could even do all of this in a simple scripting language instead of using Java to do this.


 
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jaikiran Pai wrote:Frank, welcome to CodeRanch!
I would suggest that you read it as a plain file (using the File APIs) and do a simple replace on that particular element name (using the String APIs).



Of course if one is to use this approach one must take into account the character encoding specified in the first line of the file or use UTF-8 if no character encoding is specified.

If the OP just has one file to process then using a text editor with the appropriate encoding would be the easiest approach. If the OP had multiple files then as Jaikiran suggests then writing a script is in order.

One approach I have used in the past when processing corrupt XML files is to write a filter to apply to the input stream before passing it to the XML parser but this may be overkill if the OP is writing the XML straight back out. I use the Knuth Morris Pratt algorithm since it does not require any backtracking through the input.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jaikiran Pai wrote:Where is that xml content coming from into the file? Whatever is writing out that content would ideally have to fix it.



Yes, you should really just be rejecting that document. It isn't well-formed XML and the people who sent it to you should be made to stop doing that and to start sending well-formed XML. You should really insist on that quite firmly.
 
Frank van Roekel
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jaikiran Pai wrote:

If that's not possible, then in your code where you trying to fix it, instead of reading it as XML, I would suggest that you read it as a plain file (using the File APIs) and do a simple replace on that particular element name (using the String APIs).



Thanks for your answer. I wrote a code that was using the File api. But then they told me (way too late) that I get the XML as a String instead of a file.
So now I wrote a piece of code that changes the dollar sign into a unique string and after that I iterate over the XML and remove the elements containing that unique value.

I think it does what it needs to do ;).
I hope my client thinks the same.

Anyway, thank you all!
 
Richard Tookey
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Of course you have made sure that you replace only '$' followed by '>' and that you don't modify the invalid elements so that they have the same name as existing valid elements !
 
Frank van Roekel
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Richard Tookey wrote:Of course you have made sure that you replace only '$' followed by '>' and that you don't modify the invalid elements so that they have the same name as existing valid elements !



A dollar sign may never exist between the chevrons right? so something like <te$t>1234</te$t> must also be removed?

And yes, I'm sure that I don't remove valid elements ;).

 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I advise to go with Paul's advice and reject the document. It's not XML, and any attempt to pretend that it is will likely end in tears sooner or later. For starters, "<result>" and "<result$>" seem to be different things - otherwise, why would they not both be named "<result>"? If you remove the "$", you're making them into the same thing, which may not be the right thing to do.

Or if it is not actually meant to be XML, then you can't use the JAXP APIs on it. Text search and replace would be more appropriate tools in that case. The String class has some methods that make this easy, assuming that this is the only transformation that you need to implement.
 
Ranch Hand
Posts: 930
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think better to approach the team who is supplying corrupt XML to get it fixed.
 
reply
    Bookmark Topic Watch Topic
  • New Topic