• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

building dom tree from html file

 
Frank Piorko
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,
I have the task to build a dom tree from an html file.
Concerning this I have two Questions.
1. Knows everyone a good way to build a dom tree from a
html file? ( html is not wellformed -> DOM Parser )
2. Knows everyone a good api, which can do this?
Thanks for your help.
Frank Piorko
 
Ajith Kallambella
Sheriff
Posts: 5782
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Frank - anything that is not a well-formed XML document is not an XML document. You will first have to think about making it well-formed. Any parser will error out if you try to form a malformed document.
 
Holger Prause
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah - i also search for such a solution, i know html is not werllformed , but there must be some custom parser out there building a dom tree from html.

 
Ajith Kallambella
Sheriff
Posts: 5782
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why not tweak the HTML and make it well-formed??
Remember - a malformed XML document isn't an XML document in the first place. So parsing has no meaning in that context!
 
Frank Piorko
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I cannot make the html file wellformed by hand.
The amount of html files is to big. The application
gets every some days many html files from other programmers,
who are not familar with the xml/html problem.
 
Mapraputa Is
Leverager of our synergies
Sheriff
Posts: 10065
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Frank, as Ajith said, you can convert your HTML to XHTML (well-formed HTML). You do not need to make it "bu hand", just search for "Converting HTML to XHTML" on the Internet, and you'll find something like this: http://www.vbxml.com/xhtml/articles/html_to_xhtml/default.asp
Or you can check this site: http://www.xmlsoftware.com/convert/
W4F looks good.
or HEX on http://www.xmlsoftware.com/parsers/

[This message has been edited by Mapraputa Is (edited April 30, 2001).]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic