• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:
  • Lou Hamers
  • Piet Souris
  • Frits Walraven

SAX parser issue,character call back method being called twice

 
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All
I wrote a simple SAX parser to parse my document having the following format
<data>
<element>
<record>
</record>
<record>
</record>
</element>
</data>

The data in between the record tags are unicode characters(of chinese).
However, i face problem at times as the character() call back method is called twice at times.Its totally random,can't predict.
So if my unicode data is 1234 4567 1234, it at times reads it as
1234 4 and then as 567 1234
so when i convert my unicode back to string, i get special characters.
I've checked the XML before sending, its proper and well formatted.
The converted unicode is added to an arraylist.
Thankful if someone could throw some light.

In the mean time, I've added 2 int variables.I increment one of them when the start element method is called and other when the Character method is called. I check if both are equal before converting the unicode to string, if not, i remove the last added element in the arraylist and concat it to teh current one.This has solved my problem, but want to know the reason for the improper behaviour.

Jhakda Velu
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

character() call back method is called twice at times.



The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time.

It is up to the programmer to assemble the text properly.

Bill
 
Jhakda Velu
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi
Thanks a lot for the reply. It has cleared my misconception. Any better way of going about the issue than the one I mentioned is welcome. I'm adding the part of code having my logic.
Thanks a lot.
Jhakda



int iStartCallCounter=0,iCharCallCounter=0;
private String value="";
private String oldValue="";// Class level variables

public void startElement(String uri, String x, String qName, Attributes attributes)
//additional code
iStartCallCounter++;
//additional code
public void characters(char[] ch, int start, int length)
//additional code
iCharCallCounter++;
if(iStartCallCounter!=iCharCallCounter){
value=new String(ch, start, length);
value=oldValue.concat(value);
oldValue=value;
iStartCallCounter=0;
iCharCallCounter=0;
}
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method.

When endElement occurs I use toString to get the assembled characters and then work on the logic. It appears you are trying to do logic inside the characters() method - there is no reason to do that, wait for endElement to do your logic.
 
Jhakda Velu
Ranch Hand
Posts: 167
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi
Thats a really cool way to do it. Thanks a ton!
So in the chahracters method, i keep on appendding the values got to the stringbuffer
once the end element is hit, i do the processing and at the end re-initialize the buffer to empty string,right?
Actually i was fixated with the impression that the characters method is called once only for every call to the startElement.


Jhakda
 
Ranch Hand
Posts: 155
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks everyone. code below worked for me.





 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anand Gondhiya wrote:value = value + new String(ch, start, length).trim();


The call to trim is dangerous. What if you have an element that contains "Anand Gondhiya", and the parser decides to break it up before or after the space character? Then you'd be left with "AnandGondhiya" - not what you wanted.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since I just addressed that very same question today as well, I figure this is a FAQ. So I took the liberty of adding William's explanation to the XML FAQ: http://faq.javaranch.com/java/XmlFaq
 
no wonder he is so sad, he hasn't seen this tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic