• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Ignoring ParsingExceptions for quotes

 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I used the JAXP library to convert HTML to CSV using an XSL file.

The source file is huge and has unqouted attributes so it results in a javax.xml.transform.TransformerException.

I tried finding a transformer that ignores the parsing errors but failed.

Now i am thinking of tidying my HTML file in a java code if there are no options.

Here is the JAXP code :



Please suggest how do i get around the issue.

Thanks in advance.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13073
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think it is possible or advisable to try to recover from SAXParseExceptions.

You are going to have to do some sort of preliminary cleanup. JTidy might be able to handle it.

If the input file problems are regular, you might be able to scan it as text and patch the missing attribute quotes to a new file.

Another possibility would be to drop XSLT and scan as text, writing the CSV directly.

How large a "huge" file are we talking about.

Bill
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks man,

Around 30k records distributed over 300 html files. So 300 recs per html with some javascript and other UI code on each page.

Anyway, i think tidying up the HTML looks to be the best option.

Thanks again.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13073
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It would be great if you could later post what your solution was, after you get this working. We tend to get similar questions frequently and your solution could help a lot of people.

Thanks
Bill
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ill sure do that once done.

Thanks
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic