Win a copy of Five Lines of Code this week in the OO, Patterns, UML and Refactoring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Lucene - get TermVector positions

 
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

re. Lucene:

I am indexing text using Field.TermVector.WITH_POSITIONS in order to use it for highlighting and other post processing. I have not been able to find out how to access this information during the search.

Can anyone give me a pointer?

Thanks,

Allasso
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
IndexReader.getTermFreqVector() methods
 
Allasso Travesser
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I have tried that, however, I get a compile time error:

non-static method getTermFreqVector(int,java.lang.String) cannot be referenced from a static context.

being that IndexReader is an abstract class, I can't instantiate it either.

Is there something I am missing here?

thanks for your reply, Allasso
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Allasso,

A concrete IndexReader object is constructed the normal way, i.e, using its factory method:

Note: Ensure that your writers have been close()d, before getting a reader. Lucene has a kind of versioning concept for the index; to use the latest version, all the writers should be closed.

Perphaps IndexReader's termPositions() method may also prove useful to you.

Cheers
Karthik
 
Allasso Travesser
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you X 100, Karthik. I was turning grey over that one.

Since IndexReader is an abstract class, I assumed it could not be instantiated. The Sun Java tutorial I and I section reads:

"An abstract class is a class that is declared abstract—it may or may not include abstract methods. Abstract classes cannot be instantiated, but they can be subclassed."

I guess I need to read up on factory methods. Is "reader" in your example considered an object, or is it considered something else? Is using the open() method a way of subclassing IndexReader?

thanks again,

Allasso
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Allosso,

'IndexReader.open()' internally creates an object(=instance) of a concrete (i.e., non-abstract) subclass of IndexReader, using new operator.
'reader' is a local variable which is a reference to that object.
'open()' is not subclassing IndexReader; it's just a method to read a Lucene index, and one of its steps is to create an object of a concrete subclass of IndexReader. Such methods which internally choose a particular subclass to instantiate, and return a reference to that instance, are called 'factory methods' - it's a design pattern (DesignPatternFaq)

Cheers
Karthik
 
Allasso Travesser
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thank you, Karthik,

sometimes just a few words in the right direction can get one on his way to some productive learning and save a lot of head banging.

I appreciate your thoughtfulness.

Allasso
 
Allasso Travesser
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Examples work really well for me, so I like to post the successful fruits for the benefit of others in the future...

This will print both the term positions (eg, the nth word in the original indexed content) and the beginning and ending character offsets of the queried term.

NOTE: This example only works for a single query term, otherwise you need to iterate over the query terms as noted in the "display terms" section below.

 
I am going down to the lab. Do NOT let anyone in. Not even this tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic