• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Ignoring ParsingExceptions for quotes

 
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I used the JAXP library to convert HTML to CSV using an XSL file.

The source file is huge and has unqouted attributes so it results in a javax.xml.transform.TransformerException.

I tried finding a transformer that ignores the parsing errors but failed.

Now i am thinking of tidying my HTML file in a java code if there are no options.

Here is the JAXP code :



Please suggest how do i get around the issue.

Thanks in advance.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't think it is possible or advisable to try to recover from SAXParseExceptions.

You are going to have to do some sort of preliminary cleanup. JTidy might be able to handle it.

If the input file problems are regular, you might be able to scan it as text and patch the missing attribute quotes to a new file.

Another possibility would be to drop XSLT and scan as text, writing the CSV directly.

How large a "huge" file are we talking about.

Bill
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks man,

Around 30k records distributed over 300 html files. So 300 recs per html with some javascript and other UI code on each page.

Anyway, i think tidying up the HTML looks to be the best option.

Thanks again.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It would be great if you could later post what your solution was, after you get this working. We tend to get similar questions frequently and your solution could help a lot of people.

Thanks
Bill
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ill sure do that once done.

Thanks
reply
    Bookmark Topic Watch Topic
  • New Topic