• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Lucene Hits

 
Ranch Hand
Posts: 1907
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
IndexSearcher in Lucene accepts the query and returns the Hits object.As stated in one tutorial,Lucene is IR Library rather than Search Engine.Does implementor need to construct catche/crawler for even faster search/indexing?
Also how the results are returned?As per the tutorial(s) on net,it uses Score for a page(Document in general),how differen is this in comparison with PageRank of Google?To my knowledge ,PageRank calculates the score not only on the frequency of accessing the page but also the backlinks(total pages pointing towards that page)How the score of Document is calculated in Lucene?
Does Hit stand for Hypertext Induced Topic Selection?the algorithm used to rank the document?
Thanks
Arjun
[ January 06, 2005: Message edited by: Arjun Shastry ]
 
Author
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Arjun Shastry:

IndexSearcher in Lucene accepts the query and returns the Hits object.As stated in one tutorial,Lucene is IR Library rather than Search Engine.Does implementor need to construct catche/crawler for even faster search/indexing?
Also how the results are returned?As per the tutorial(s) on net,it uses Score for a page(Document in general),how differen is this in comparison with PageRank of Google?To my knowledge ,PageRank calculates the score not only on the frequency of accessing the page but also the backlinks(total pages pointing towards that page)How the score of Document is calculated in Lucene?
Does Hit stand for Hypertext Induced Topic Selection?the algorithm used to rank the document?
[ January 06, 2005: Message edited by: Arjun Shastry ]



I call Lucene a "search engine" because its a convenient and recognizable term. Technically it is an API that has no user interface, no crawler, and no parsers. To me, it is the "engine", whereas Google is a search "application". Semantics and word games aside it is not necessary to implement caching around Lucene. The Hits object itself has some built-in caching for most recently accessed (or soon to be accessed) documents.

Hits from Lucene are ordered by score, a sophisticated calculation which puts more relevant documents (to the query) at the top, and less relevant documents below.

Google's PageRank is comparable to how Nutch, a system built around Lucene, ranks its documents. It does lots of Lucene trickery to weight documents in a PageRank-like fashion. Most of us, however, are not building web crawlers where PageRank works decently. In intranet or other domains of use, the built-in Lucene scoring mechanism works amazingly well.

I have never heard that acronym for HIT, and I do not think it applies to Lucene's concept of a Hit. A "hit" is synonymous with "match".
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic