• Post Reply Bookmark Topic Watch Topic
  • New Topic

Html Parser  RSS feed

 
zainu Mehmood
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Guys,
I just need a suggestion.Actually i want to extract text from some Websites,not all but just some like 'description' on home page.Can you please guide me how should i do that.Till now i have some solutions but they are slow and (extract all the text instead of just what i want ,sorry for being stupid).
So can you guys tell me some good Html Parser which you have used personally and moreover, any hint about how can i filter out ,lets say just description on home page.

best regards,
zzz
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 37465
539
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've used Sourceforge's HTML Parser. It worked well for reading/parsing. There were some issues with writing it back out, but that wasn't my requirement.
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Instead of using an HTML library, I'd choose HtmlUnit to handle web pages. It's API is far nicer than that of HTML parsers, since it operates on a higher level (and handles web conversations as well). If I really wanted to use an HTML parser, I'd probably choose TagSoup.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!