Otis Gospodnetic

Author
+ Follow
since Dec 30, 2004
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Otis Gospodnetic

Jeanne Boyarsky wrote:Winner picking is delayed due to technical errors. I will replace this thread with the winners when the problem is resolved.

First, a big thanks to Michael McCandless, Erik Hatcher, and Otis Gospodnetic for being here to promote the book Lucene in Action.

The winners are:

Lester Burnham
Joachim Rohde
Jigar Naik
Ankit Garg



Congratulations and enjoy the book!

Otis

Gian Franco wrote:Open-source, like the dominating Lucene, of course do not require
licensing costs but might demand development resources...

If proprietary products bundle or work with open-source components
such as Lucene, it helps them to have more complete and cheaper
solutions on their shelves.

Cheers,

Gian



Example: Attivio uses Lucene at its core, but adds some unique functionality. It ain't cheap, though!

Otis

Gian Franco wrote:Hi,

An incredibly handy tool, companion to Lucene is Luke...

What do you think of Luke and are there any other
similiar tools you would suggest?

Cheers,

Gian



Luke is the best such tool out there. LIA1 has some other ones, too.
There is also http://code.google.com/p/lucene-sql/

Otis

Pradeep bhatt wrote:

Otis Gospodnetic wrote:

Pradeep bhatt wrote:Who are the competitors to Lucene ?



People sometimes use Sphinx or Xapian.
Or Endeca, or FAST, or Google Search Appliance, or... or they use Solr.

Otis



Thanks again author. How do they stand against Lucene ?



The commercial ones are losing in the long run
The free ones are behind in terms of adoption, community, and maybe some other things. They may be better at something, too.

Otis

Samar Chauhan wrote:Let me define design problem as that to create a search facility in the intranet for particular topics in the ever increasing pool of pdf books ,what should I consider for combination from java desktop/browser searching the database/file-system through luecene/solr.Adding the pdf should be user freindly( i.e indexing will be done also along with). Can I have your inputs for the basic design ?.



Pointer: Solr Cell - http://wiki.apache.org/solr/ExtractingRequestHandler

Otis

Pradeep bhatt wrote:Who are the competitors to Lucene ?



People sometimes use Sphinx or Xapian.
Or Endeca, or FAST, or Google Search Appliance, or... or they use Solr.

Otis

Pradeep bhatt wrote:Suppose I want to search for word "run" and my documents are in different languages[English, french ect], would the search result return docs in with meaning "run". Can we try some custom code to do this/



Lucene can't understand the meaning of the word - it simply matches tokens. Which tokens get indexed depends on what the analysis does with the input text. This is well covered in LIA.

Otis

Gian Franco wrote:...is the hot backup a new feature?



Yes, it's relatively new.

Otis
There is another relevant solution: synonym injection via the Analyzer. Here is some context: http://www.lucenebook.com/search?query=synonym

The code that comes with the book includes a synonym engine.

Otis
Let me just add a bit to the answer about how a search based on regular expressions compares to what Lucene does. Think about a large Web-wide index, like the one you search with Google, AlltheWeb, Teoma, WiseNut, or Yahoo. Imagine trying to search that using just regular expressions. Pretty funny to imagine.
Actually, I did explain this in the book, and the first result for the following query gives you some info: http://www.lucenebook.com/search?query=sequentially (the first hit is from a free, sample chapter, so you can get the whole thing and read it).

Otis
Oh, Nutch has a lot in common with Lucene. Not only was Nutch created by the same person who created Lucene, Nutch also uses Lucene for the actual indexing and searching. Also, Lucene in Action includes a Nutch case study in the Case Studies chapter (chapter 10). Check this: http://www.lucenebook.com/search?query=nutch

Otis
Axel,

I don't read that person's blog regularly, but I somehow did stumble across that particular post. I would take judgements of Jakarta with a grain of salt, although it is true that Lucene is a remarkably stable and solid open-source project (not just within Apache Jakarta). In my opinion, this is primarily due to great leadership of Lucene's creator, Doug Cutting.

Otis
Hello,

PDF -> text extraction generally falls outside of the scope of each individual Lucene port. Typically you use an external, independent application of library (e.g. PDFBox - as a matter of fact, you can see that in chapter 7 - http://www.lucenebook.com/search?query=pdfbox ).

As for quality of ports, I am on various ports' mailing lists, so my impression is that CLucene, dotLucene, and PyLucene are all very active and probably of solid quality, judging by people who are behind them. Lupy is lagging more, and I know it doesn't support quote everything that original Lucene does.

Otis
Greg,

Here are some mentions of big Lucene deployments: http://www.lucenebook.com/search?query=mayo (look at the context around the highlighted keyword). You can get a longer list at http://wiki.apache.org/jakarta-lucene/PoweredBy . My last big use of Lucene is in Simpy - http://www.simpy.com/ , where I have thousands of small Lucene indices.

Otis
Hi,

Lucene is all about text indexing and full-text searching. It's a full-text library/toolkit that you can use to add searching capabilities to your applications.

You will find a lot of Lucene resources (articles, tutorials, etc.) at
http://wiki.apache.org/jakarta-lucene/IntroductionToLucene and at http://www.java201.com/resources/browse/38-all.html . You could also grab the free chapter from Lucene in Action, chapter 1. It will explain what Lucene is and how it is used. Chapter 1 can be dowloaded from http://www.manning-source.com/books/hatcher2/hatcher2_chp1.pdf

Otis