• Post Reply Bookmark Topic Watch Topic
  • New Topic

Crawling with Apache Nutch  RSS feed

 
Anoop Isaac
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to crawl webpages using Apache Nutch and index the same using Solr. Is it possible while indexing the pages in Solr, only index the modified pages? I tried using 'TextProfileSignature' as signature class which is used to create the digest, but inspite of digest holding same value across fetches, page is getting indexed again in Solr
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!