Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parsing XML elements

 
Melanie Walsh
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a class that implements ContentHandler, the character method looks for tags then extracts the content.
It all works well except when it finds a '[' or ']' in the content, in this case it returns just a ']', can anyone tell me why, is this a special character.
this is the code


Would be very grateful on any suggestions

ps:
Fixed the Code tags. UBB tags use [ ] not the angle brakets like in XML format. Now, you really hate those [ and ] brakets, don't you!
[ April 21, 2005: Message edited by: Madhav Lakkapragada ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I don't know about the square brackets. There are a handful of characters that choke the parser, but I haven't had trouble with any disappearing.

I want to invite you to look at another potential issue, though. The characters() method is not guaranteed to give you all the contents of a tag in one shot. You can imagine the parser buffering input somewhere under the covers - it might even be true. If the end of one buffer comes in the middle of a tag, it might call you with the characters it has so far, read up the next buffer and call you again with the rest of the characters in the tag. I learned this by being burned by a parser in another language that worked in 2048 byte chunks.

My solution has been to have the characters() method append what it gets to a member variable string, and to use the string in the endElement method instead. Seems to work so long as you don't have nested tags in the middle of the text like a bold word in the middle of an HTML paragraph.

Let me know what you learn on the square brackets!
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving to XML forum...
 
Madhav Lakkapragada
Ranch Hand
Posts: 5040
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Most probably a parser issue, I am lead to believe.
I tried a simple example with the standard Echo.java program from the SAx tutorials and it did print the square brakets as text, without missing or truncating it.

Try to echo you input using this code and see if it works.
When I ran a sample, I used the J2SE 1.4 parsers.

This source code is from the SAX tutorials (courtesy java.sun.com).


This code also illustrates what Stan has suggested in his post.
Regards.

- m
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic