• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Looking for suggestive search text algorithm

 
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

I'm trying to find some information on how to implement a suggestive search text box. You know the ones that suggest possible words as you type.

Minimally I want to implement one that suggests words that start with the entered string. Ideally I will likely want to implement one that finds phrases in the middle of a word as well as the start. So if someone starts typing "ball" it would suggest not only "ball", "balloon", and "ballroom", but would also suggest "Football".

I have been trying to find an algorithm for doing something like this efficiently. My Google searches have not turned up much. But I think it is simply because I am not searching for the correct term. I've tried things like "Suggestive search Algorithm". It's a fairly common things these days, so I have to assume there are some best practices and optimized algorithms for doing it.

Can anyone point me in the right direction? If I at least know what the proper name for such is, that'd be helpful. But if anyone knows of an article or whitepaper that discusses how to implement this, I'd be thankful.

Thanks!

p.s. ultimately, the item being searched would be backed up by a large database containing a large list of keywords.
[ August 13, 2008: Message edited by: Lance Miller ]
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
"suggestive search algorithm" certainly sends my mind off in odd directions

Ahem

Anyway - what is going to be searched here - a database with a contolled vocabulary or something completely open ended? There is no sense in suggesting words that are not going to lead anywhere.

Is this for a HTML/web page or a desktop application?

For prefix based suggestions, your best bet is simply an alphabetical array, see the java.util.Arrays class for the binary search methods to locate a starting point. I would suggest waiting until the client types at least 3 characters, otherwise the list of suggestions would be huge.

The arbitrary character sequence - ie "ball" suggests football - is a big problem, better get the prefix working first.

Phonetic matching to suggest alternate spellings is another possibility, I did this one time for a client who needed to transcribe court hearings.

Bill
 
Lance Miller
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the reply Bill.

Originally posted by William Brogden:

what is going to be searched here - a database with a controlled vocabulary or something completely open ended? There is no sense in suggesting words that are not going to lead anywhere.



It's going to be a product database. In particular, it's going to be searching the product names. So the vocabulary wont be completely controlled (i.e. enumerated values); but the suggested terms will be limited to what's in the database.


Originally posted by William Brogden:

Is this for a HTML/web page or a desktop application?


It may be both. The business analysts are still working out the requirements. But its looking like a "full featured" desktop application and a "limited featured" web app may be the desired functionality. The search feature would be a feature of both UIs. So I am hoping we can implement something common that the UI (either an Ajax call or a Swing event) calls. Do you think that is doable? Or is a separate implementation for each a better way to go?

Originally posted by William Brogden:

For prefix based suggestions, your best bet is simply an alphabetical array, see the java.util.Arrays class for the binary search methods to locate a starting point. I would suggest waiting until the client types at least 3 characters, otherwise the list of suggestions would be huge.


I also was thinking about waiting until 3 characters have been entered. Thanks for the tip on the java.util.Arrays class. I'll take a look.

Originally posted by William Brogden:

The arbitrary character sequence - ie "ball" suggests football - is a big problem, better get the prefix working first.



I figured this would be the harder one to implement. Unfortunately it is a highly desired feature and may be a required feature. Reason being is that for the product name search, if some searches for a "Widget", it is desired for all products with Widget in the name, such as "Acme Widget", "Company Xyz Widget", and just generic "Widget" to be displayed. And they want them displayed with the vendor name. Ultimately, if the user selects a specific product, information on that product is returned. If the user just searches for a Widget, then there is some logic involved as to which vendors' product are returned and in what order.


As I think about it now, I'm thinking we can split on spaces so the "Ball" search I mention above would work for "Foot Ball" but not "Football". That might make things easier. What do you think?

Then of course then there is the issue of making the search case insensitive.
[ August 13, 2008: Message edited by: Lance Miller ]
 
Marshal
Posts: 79076
376
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Databases???

That set my mind on all sorts of wild diversions . . .

Who needs searches when you can say

SELECT name FROM table WHERE name like '%ball%';

?

Does that help at all? You would need some sort of Listener (??Key??) if you are using Swing so as to get a response when letters are written in a field.
 
Lance Miller
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Campbell,

The main point is that the user needs to see a list of suggestions as they type the word. As they type, the suggestion list gets smaller, like in the Google toolbar search box. I was thinking a database search (given internet latency) after each letter is typed wouldn't be responsive enough and that there would be a lag. I was thinking an initial database fetch based on the first two or three letters to limit the collection size, and then some type of algorithm to search that collection.

Do you think a fresh database query after each letter would be responsive enough, especially considering how fast some people type

I don't know, maybe I am making this more complex than it needs to be.
[ August 13, 2008: Message edited by: Lance Miller ]
 
Ranch Hand
Posts: 1296
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How about a Ternary Search Tree.
[ August 13, 2008: Message edited by: Garrett Rowe ]
 
Campbell Ritchie
Marshal
Posts: 79076
376
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Lance Miller:
Do you think a fresh database query after each letter would be responsive enough, especially considering how fast some people type

Probably not, but any "wild diversions" should be considered.
 
Ranch Hand
Posts: 192
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Lance Miller:

Do you think a fresh database query after each letter would be responsive enough, especially considering how fast some people type




How about searching after a pause in typing? A few hundred milliseconds should do.
When the customer pauses is probably when he/she's interested in your suggestions anyway.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The restricted vocabulary domain of a product database will certainly make your job easier.

I doubt that dispatching a query to a separate database will be fast enough for the "ball" finds "football" problem.

Any vocabulary based suggestion mechanism should certainly be usable for both web and desktop.

Exactly how many product names are we talking about here? You might be surprised at how fast Java's regex pattern matching can locate partial matches.

Bill
PS - watch out for tying up your GUI event Thread. Searchs should run in separate Threads asynchronously.
[ August 14, 2008: Message edited by: William Brogden ]
 
Lance Miller
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

Thank you all very much for the replies! The suggestions and discussions have been very helpful. Please keep them coming.


Originally posted by Garrett Rowe:
How about a Ternary Search Tree.



Thanks Garrett! That does look interesting and promising. I only did a quick read of it at this point, but it looks very promising and has provided me some more things to search on and research.

It's been a long time since my Data Structures and Algorithms class and I haven't needed to do much more than basic stuff until now. So I am very rusty. I'll have to dust of my Robert Sedgewick book and so some more reading on the subject.


Originally posted by Taariq San:
How about searching after a pause in typing? A few hundred milliseconds should do.
When the customer pauses is probably when he/she's interested in your suggestions anyway.



Interesting idea. Thanks for the suggestion.

Originally posted by William Brogden:
Exactly how many product names are we talking about here? You might be surprised at how fast Java's regex pattern matching can locate partial matches.



We're talking a few hundred-thousand products. I'm a regex proponent, so I definitely will look into using that in the solution if appropriate.

Originally posted by William Brogden:
PS - watch out for tying up your GUI event Thread. Searchs should run in separate Threads asynchronously.



I was thinking the same thing. Thanks for the reminder. I have found some solutions for Auto Complete JTextFeild implementations including one at Swing Labs. I need to look at them in more detail to verify the thread isolation. Most of them look like they are designed for more limited data sets. I need to look some more to see if that was just for demonstration purposes or an actual design limitation. If the latter, I'll need to modify it so the list of words is dynamically updated on a separate thread. Thanks Bill, you have been very helpful. If you think of anything additional, please let me know.
 
Ranch Hand
Posts: 83
Spring Tomcat Server Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How about a Search engine based approach. For example try Lucene search engine...here is what would be my approach...

1. Create an index on file system for the data you would want to make suggestions. So say in this case keep product and some of its attributes in the Lucene index file.

2. Whenever your event is fired try searching the index using Lucene search API which is going to give very quick response. As it does some in memory optimization for searching based on its ranking and some algorithms...

3. The search results may contain lot of data out of which you may want to show initial few which would be more relevant in terms of keyword matching...

4. You can generate your search queries in such a way that you get more relevant results.....Say if a user is logged in from California and you want him to show products related to him....try to add few more criteria at runtime to the query so that it fetches more relevant results....


We are doing lot of big data (more then 200GB index) searches using Lucene search engine .... and its really fast as compared to doing direct database query with multiple criteria and wild cards.....
 
Lance Miller
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Sachin Joshi:
How about a Search engine based approach. For example try Lucene search engine
...

We are doing lot of big data (more then 200GB index) searches using Lucene search engine .... and its really fast as compared to doing direct database query with multiple criteria and wild cards.....



Thanks for the information and your experiences. I have, of course, heard of Lucene but would never had thought of using. I never would have thought it was that responsive. I'll add to the list of things to experiment with.

Thanks again for taking the time to reply.
 
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You might also be interested in Xapian which is a C++ search with Java bindings.
 
Taariq San
Ranch Hand
Posts: 192
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Lance Miller:
I need to look at them in more detail to verify the thread isolation.



If you're using swing this is as simple as SwingUtils.invokeLater.
It just kicks off another thread to do the work instead of waiting for the response, so the user keeps typing.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Lance,

Wondering if you can share your experiences on how you went about doing the final implementation.

 
Campbell Ritchie
Marshal
Posts: 79076
376
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

He might not be reading any more, so you might not get a reply.

[My 15000th post ]
 
Sheriff
Posts: 22780
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, Lance did post just 2 weeks ago so he's still active on the ranch. He may not read this forum, his last posts here are in this thread.
 
Campbell Ritchie
Marshal
Posts: 79076
376
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you for the information, Rob.

It might be worth PM-ing Lance to draw his attention to your question, N. Ravishankar.
reply
    Bookmark Topic Watch Topic
  • New Topic