• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Knute Snortum
  • Paul Clapham
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Frits Walraven
Bartenders:
  • Ganesh Patekar
  • Tim Holloway
  • salvin francis

SAX parser issue,character call back method being called twice  RSS feed

 
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All
I wrote a simple SAX parser to parse my document having the following format
<data>
<element>
<record>
</record>
<record>
</record>
</element>
</data>

The data in between the record tags are unicode characters(of chinese).
However, i face problem at times as the character() call back method is called twice at times.Its totally random,can't predict.
So if my unicode data is 1234 4567 1234, it at times reads it as
1234 4 and then as 567 1234
so when i convert my unicode back to string, i get special characters.
I've checked the XML before sending, its proper and well formatted.
The converted unicode is added to an arraylist.
Thankful if someone could throw some light.

In the mean time, I've added 2 int variables.I increment one of them when the start element method is called and other when the Character method is called. I check if both are equal before converting the unicode to string, if not, i remove the last added element in the arraylist and concat it to teh current one.This has solved my problem, but want to know the reason for the improper behaviour.

Jhakda Velu
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

character() call back method is called twice at times.



The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time.

It is up to the programmer to assemble the text properly.

Bill
 
Jhakda Velu
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
Thanks a lot for the reply. It has cleared my misconception. Any better way of going about the issue than the one I mentioned is welcome. I'm adding the part of code having my logic.
Thanks a lot.
Jhakda



int iStartCallCounter=0,iCharCallCounter=0;
private String value="";
private String oldValue="";// Class level variables

public void startElement(String uri, String x, String qName, Attributes attributes)
//additional code
iStartCallCounter++;
//additional code
public void characters(char[] ch, int start, int length)
//additional code
iCharCallCounter++;
if(iStartCallCounter!=iCharCallCounter){
value=new String(ch, start, length);
value=oldValue.concat(value);
oldValue=value;
iStartCallCounter=0;
iCharCallCounter=0;
}
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method.

When endElement occurs I use toString to get the assembled characters and then work on the logic. It appears you are trying to do logic inside the characters() method - there is no reason to do that, wait for endElement to do your logic.
 
Jhakda Velu
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
Thats a really cool way to do it. Thanks a ton!
So in the chahracters method, i keep on appendding the values got to the stringbuffer
once the end element is hit, i do the processing and at the end re-initialize the buffer to empty string,right?
Actually i was fixated with the impression that the characters method is called once only for every call to the startElement.


Jhakda
 
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks everyone. code below worked for me.





 
Rancher
Posts: 42974
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Anand Gondhiya wrote:value = value + new String(ch, start, length).trim();


The call to trim is dangerous. What if you have an element that contains "Anand Gondhiya", and the parser decides to break it up before or after the space character? Then you'd be left with "AnandGondhiya" - not what you wanted.
 
Ulf Dittmer
Rancher
Posts: 42974
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since I just addressed that very same question today as well, I figure this is a FAQ. So I took the liberty of adding William's explanation to the XML FAQ: http://faq.javaranch.com/java/XmlFaq
 
We've gotta get close enough to that helmet to pull the choke on it's engine and flood his mind! Or, we could just read this tiny ad:
ScroogeXHTML - small and flexible RTF to HTML converter library
https://coderanch.com/t/710903/ScroogeXHTML-RTF-HTML-XHTML-converter
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!