• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
  • Devaka Cooray
  • Jeanne Boyarsky
  • Knute Snortum
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Ganesh Patekar
  • Stephan van Hulst
  • Pete Letkeman
  • Carey Brown
  • Tim Holloway
  • Ron McLeod
  • Vijitha Kumara

Search Strategy  RSS feed

Ranch Hand
Posts: 221
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Folks,

SQL isn't really my thing, I know how to do the usual things, but I'm no expert. I wonder if anyone could comment/provide suggestions on this?

(Using Postgres 7.1)

There is a table in the database, Site, which has fields: address_1, address_2, address_3 and postalcode.

One of the problems is, the address fields are filled out inconsistently. So for example, the third line may be empty, but the second contain the city. You get the idea.

Now I need a search which takes address lines 1, 2, 3 and the postcode as search terms, finds matches and ranks them. For example, the user entered search terms may be:

Address 1: New Rose Hotel
Address 2: Gibson Street
Address 3: GibsonTown
Post code: WGB 123

I have a query which works, but seems really ugly. I concatenate lines 1, 2 and 3 of the search query and tokenize into individual words. I then filter out anything less than 3 characters (generally Rd. St. etc) and any really common tokens (street, road, lane, place etc). I also filter punctuation and whitespace from the post code and limit to 5 characters. This leaves me with search criteria like this:

New, Rose, Hotel, Gibson, GibsonTown and WGB12

Now I build a query. The query works by concatenating the address lines (using coalesce to turn NULL in an empty string), then doing a case insensitive regular expression match with word boundaries. I generate one such sub query per token, then UNION ALL the results.

To add a ranking system, I add a constant value of '1 point' for each token match, and '3 points' for a postcode match.

The whole lot is wrapped in another query which LEFT JOINs with the Site table to get the info I want. This outer wrapper is really just for testing, as I just need the primary keys in my application.

So, our example query would look like this:

Seems a bit complicated! Any SQL wizards want to let me in on a secret of how to do this more simply/efficiently?

Much appreciated!
[ April 05, 2005: Message edited by: Horatio Westock ]
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!