Win a copy of Head First Agile this week in the Agile forum!
programming forums Java Java JSRs Mobile Certification Databases Caching Books Engineering OS Languages Paradigms IDEs Build Tools Frameworks Products This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
Sheriffs:
Saloon Keepers:
Bartenders:

# Find Top X Algorithm

David Soloman
Greenhorn
Posts: 2
Hi all

I have this interesting problem. Assume 20 items (A,B,C,D etc) are owned by a group of 10,000 people in various combinations (permutations are not important). Something like

OwnerId, ItemId
1,A
1,C
2,A
2,B
2,C
and so on.

I need to know which top 10 items account for most number of people?

Thanks.

Ulf Dittmer
Rancher
Posts: 42972
73
So you write an algorithm to compute the answer, which is pretty straightforward. What exactly is your question? Which ways of solving it have you thought of so far?

David Soloman
Greenhorn
Posts: 2
The approach I have in mind is:

1. Determine combinations in data with a maximum of 10 items (ignore >10 combinations) and their popularity (#owners)
2. Pick the first combination - determine #owners for all sub-combinations existing in data
3. Choose up to 5 items starting from most popular to least popular combinations

The problem: Although not bruce force, I feel this is still requires a large computation. What do you think?

fred rosenberger
lowercase baba
Bartender
Posts: 12542
48
One excellent problem solving technique (from Polya's How To Solve It) is to simplify the problem.

How would you solve this if you only had to pick the top ONE item? possibly even from a group of 10 people?

get a good clean method for doing that, then see if you can expand it to more people/items.
[ November 25, 2005: Message edited by: fred rosenberger ]

Paul Clapham
Sheriff
Posts: 22531
43
I would just dump those ownership pairs (less than 200,000 of them) into a database and do

SELECT ItemID, Count(*) as number FROM table GROUP BY ItemID ORDER BY number desc

Sure, it's a "big" problem but the database should be able to do that query in a couple of seconds.

Virag Saksena
Ranch Hand
Posts: 71
Create a hash map/table for the owned elements.
for every input pair
probe the hash table with key = item
if the key exists, get the value, increment it, and put it back
if the key does not exist, create a value 1 and put it in the hash table
So with n put/get operations, you will have a list of your counts
Then enumerate the hash table, put the values in a array and sort them

In the worst case you go to O(n log n). If the number of items is much smaller than number of entries, you essentially will get the result in O(n)