• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Parsing an XML that contains the '&' character

 
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello all,
I've just started working on a new project and I encountered an XML parsing problem.
A (very) short description of the project's design:
There is a servlet that accepts XML and passes it to a handler to process.
Everything is in UTF-8 encoding.

The problem is in that handler:
Suppose I have something like the following:

the characters method of the DefaultHandler seemed to split that value and actually was called 3 times.

The original code (which I started working on) had a string that was set in characters method:

It was initialized to empty string at the beginning of startElement method.
Then, in endElement, tempVal was used to build the domain objects.

I created a small JUnit test and found a solution.
The solution is to concatenate that tempVal in characters method.

I would like to consult you if this is the correct one, or is there a better one.

(Forgive me for the long post, as I wanted to be as clear as possible).
Here's the code (i could not attach a java / text file)



well, my question can be also:
Is there a way to set the way the parser works so it won't split the '&' ?


Thank you very much for any help
 
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One option would be only to allow valid XML, which won't have < and & in it.
 
Ranch Hand
Posts: 2187
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The ampersand character is a "special character" in XML-based markup languages. In order to include the ampersand character in the instance, you must use the XML entity instead of the character itself. The entity is &amp;
 
Sheriff
Posts: 26776
82
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If this is the FAQ where a SAX parser splits a text node into several parts and calls the characters() method once for each of them, then yes, everything you said was correct. And your solution was correct too. And no, you can't configure the parser to not do that. After all, the documentation does say it might do it and it doesn't cause any problems for applications that take that possibility into account.

However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.
 
Eyal Golan
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you all for the answers.

Paul Clapham wrote:...However if you use a StringBuilder instead of a String to combine the parts, you will find it has an append() method which is perfect for the parameters of the characters() method. It would be better to do it that way.


Thanks for reminding me the StringBuilder. I used the String tempVal as this is what was before...

And again, thank you all
 
Could you hold this puppy for a sec? I need to adjust this tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic