• Post Reply Bookmark Topic Watch Topic
  • New Topic

text extractors from web pages

 
Ali Khalfan
Ranch Hand
Posts: 129
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm trying to implement a search engine on selected articles from all around the web. Thing is to make the search work i'll need to extract the right content from the source code; so no javascript; no flash no header; no footer...etc just extract the right sentences.

anyone know of any api the could do this....i saw lingpipe (even used it, but its performance is a bit consuming and it takes a lot of space). [URL=
http://alias-i.com/lingpipe/index.html]
http://alias-i.com/lingpipe/index.html[/url]

so if i'm gonna extract anything from this page it should just be what i write and the replies as weill as the subject (not the header above or the url for the deer with one eye )
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!