• Post Reply Bookmark Topic Watch Topic
  • New Topic

Searching for Intelligent HTML Parser  RSS feed

 
Hussam Galal
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I am searching for an HTML parser to extract useful data from an HTML page, hopefully the parser will have some intelligence, I am using the parser to collect the data from online computer hardware stores searching by keywords will give good results for a single webpage, but when using the parser to extract information from more than 10 pages parsers usually give poor results because the information is displayed differently in each page.



All suggestions are appreciated
regards,
hussam.galal@gmail.com
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could use NekoHTML, which will give you a DOM tree of the page in question, and then use XPath to retrieve the results. You'd have to analyze the page first, in order to define the XPath, but you'd only have to do that once per site (and of course after every page redesign).
 
Casper Maxwell
Ranch Hand
Posts: 88
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You may check this list too:

http://www.java-tips.org/content/category/9/103/45/
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!