I reccomend Sun's very own parser! It is defined so that malformed html's are 'fixed' by adding missing tags. it's architecture is event driven and gives good perforance.
You should (could) implement the above call-back.
Default implementation exists, but I think you could (should) improve upon it.
I have used it extensively and it is quite good. I uesd for parsing of hundreds of thousands of documents (!) with no memory leeks.
If you have any more questions I would be happy to help.
Azriel