Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

A Question on Lucene

 
Joe Harry
Ranch Hand
Posts: 10124
3
Eclipse IDE Mac PPC Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Guys,

I have a question on using Lucene to search and serve HTML contents for my web app. One general question that I have is how to read the HTML documents and index it's content so that they are searchable? Are there any good references other than the demo app that comes along with the Lucene download?
 
Joe Harry
Ranch Hand
Posts: 10124
3
Eclipse IDE Mac PPC Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does Apache Solr and Lucene complement each other? What is the difference between these two?
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Reading through the websites of both Solr and Lucene, they don't sound similar. If this is for the project you mentioned elsewhere, then Lucene is almost certainly the proper choice.

With respect to HTML, I think Lucene comes with an example you should be able to adapt. If you're serious about it then you really should work through "Lucene in Action "; it'll save you much time and effort.
 
Joe Harry
Ranch Hand
Posts: 10124
3
Eclipse IDE Mac PPC Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I'm planning to give my community project that I'm working on Lucene powered search capabilities to actually search for articles. I'm using the Lucene demo and building on top of that. But there are certain things that I would like to customize and certain things that I need to understand. Lucene in Action looks promising. Will give it a try.
 
Joe Harry
Ranch Hand
Posts: 10124
3
Eclipse IDE Mac PPC Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, Lucene in Action says that Solr is a crawler.
 
Joe Harry
Ranch Hand
Posts: 10124
3
Eclipse IDE Mac PPC Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Gold hold of Tika for content extraction and it really made my life easier.
 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If Joe's still around:
Correct my understanding - you used Apache Tika for converting Html files into indexable text format and used that indexes to be searched using Lucene?
 
R Hoefer
Greenhorn
Posts: 10
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not Joe, but that's starting on the right track.


 
John Jai
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Hoefer!
 
R Hoefer
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since I'm thinking of it, have you heard of Luke? http://code.google.com/p/luke/

Nice tool to manage lucene databases in a GUI. I wish someone had told me about it when I started messing with Lucene.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic