I am working on a task of finding similar records from large database.
I use Hibernate Search/ Apache Lucene/JPA for indexing and searching records in database.
I want to give a paragraph of text in input field (xHTML text box) and I want the hibernate search to return close matching matching records (may be top 10) fast.
I extracted all keywords from given input (eliminated stopwords & stemmed) and searched those keywords against database records. But it is taking more time when the number of keywords are more.
How can approach it or re-structure to get accurate records in quick time. I feel that I should try to form phrases out of those keywords before searching to avoid the no of comparisons. Any ideas or suggestions (algorithms) would be really helpful for me to travel in right direction.