This week's book giveaway is in the Reactive Progamming forum.
We're giving away four copies of Reactive Streams in Java: Concurrency with RxJava, Reactor, and Akka Streams and have Adam Davis on-line!
See this thread for details.
Win a copy of Reactive Streams in Java: Concurrency with RxJava, Reactor, and Akka Streams this week in the Reactive Progamming forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Junilu Lacar
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Ganesh Patekar

Lucene and index updation

 
Ranch Hand
Posts: 8934
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How does index gets updated when the content of file changes.
 
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not automatically. The indexing code must explicitly delete the old index entry and re-index the file.
 
Pradeep bhatt
Ranch Hand
Posts: 8934
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok. that would mean we may miss some files because the index has not been updated.
 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Correct. The indexing code needs to strike a balance between freshness of the index and performance when deciding how often to check for updated files/content.
 
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes you need to Commit after you update/delete terms/docs from the Index.

Though these are deleted it still are present in the Index (Marked for deletion), and doesn't come up in ranking/search etc.

 
blacksmith
Posts: 979
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Pradeep bhatt wrote:How does index gets updated when the content of file changes.



...it all depends on the tools used to manage the files.

If all your tools are aware of the Lucene index underneath
and they all take care of keeping the index up to date then
you are fine: each tool could funnel the changes on the files
through Lucene's API...

If the files can be managed by a tool that does not update
the index, then you are likely to have/need a batch process
that goes over the files periodically and makes sure that
the index is up to date.

Cheers,

Gian
 
author
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One of the nice recent additions to Lucene is a feature called near-real-time search (it's covered in the book -- the IndexWriter.getReader method).

This make the turnaround time between making changes (adds/deletes/updates) to the index, and opening a new searcher that can see these changes, much faster, because you no longer have to .commit or .close the IndexWriter in order to see the changes.
 
Pradeep bhatt
Ranch Hand
Posts: 8934
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Michael McCandless wrote:One of the nice recent additions to Lucene is a feature called near-real-time search (it's covered in the book -- the IndexWriter.getReader method).

This make the turnaround time between making changes (adds/deletes/updates) to the index, and opening a new searcher that can see these changes, much faster, because you no longer have to .commit or .close the IndexWriter in order to see the changes.



Thanks author for sharing this piece of useful info. From which version has this been introduced ?
 
Michael McCandless
author
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome!

Near-real-time search was added in 2.9.
 
Pradeep bhatt
Ranch Hand
Posts: 8934
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the info. Do you suggest backing up the index files for critical applications. I think, if index files get lost ,constructing them would take time if we dealing with huge amounts of data.
 
Aneesh Vijendran
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes very much. Chances of Index files getting corrupted are not rare. I had this situation once when there were data in the index with some unknown/asci/Unicode characters.

Some 30k docs in the Document Manager when indexed, the index size was 7-8 GB.

Imagine if this gets corrupted ;)

Cheers
Aneesh
 
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Pradeep bhatt wrote:Do you suggest backing up the index files for critical applications.


Using the word "critical" here answers your own question: under what circumstances would you *not* back up "critical" data?!
 
Michael McCandless
author
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The book shows how to take a hot backup of the index (ie backing up in the background even while an IndexWriter is still making changes to the index). The resulting backup is still a point-in-time copy of the index, as of when the backup began.
 
Gian Franco
blacksmith
Posts: 979
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
...is the hot backup a new feature?

I used to have a companion batch program for my
application that, in case of emergencies, would
re-index all the data.

Regarding performance of reindexing large data sets...
Lucene was not the bottleneck...finding the files and
extracting the text took most of the time...

Cheers,

Gian
 
Author
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Gian Franco wrote:...is the hot backup a new feature?



Yes, it's relatively new.

Otis
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!