• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Problem SAX org.xml.sax.SAXParseException; lineNumber: 13; columnNumber: 69; Character reference "&#

 
sahar eb
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have a String as follow:


I want to return the content that is in between <documentFullPageContent> tags, which is HTML!!
here is my code:




and here is the error I get:



Can you tell me what I am doing wrong? and how can I solve it?

Thank you!!

p.s. I just found that builder contains Only a part of html not All of it. although I a using builder.append!!
 
Paul Clapham
Sheriff
Posts: 21322
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you're going to parse HTML with an XML parser, then you have to make sure your HTML is also well-formed XML. Yours isn't, so you can't use an XML parser to parse it. So you have two options: (1) Use an HTML parser instead; (2) Run an HTML cleanup product like HTMLTidy to convert it to XHTML.

As for the error message, it looks like you have an invalid character reference; you should examine the document to find out what it is. My guess is that it's in "garant?a" because my browser and/or the forum software identifies the second-to-last character as something it can't understand.
 
sahar eb
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Paul,
Thank you soo much for replying, yes, I found Sax doesnt recognize that letter you mentioned and also "&#", special characters at all. So I got every thing between the tags using regex.

anyway thanks again
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic