# Find Top X Algorithm

David Soloman

Greenhorn

Posts: 2

posted 10 years ago

Hi all

I have this interesting problem. Assume 20 items (A,B,C,D etc) are owned by a group of 10,000 people in various combinations (permutations are not important). Something like

OwnerId, ItemId

1,A

1,C

2,A

2,B

2,C

and so on.

I need to know which top 10 items account for most number of people?

Thanks.

I have this interesting problem. Assume 20 items (A,B,C,D etc) are owned by a group of 10,000 people in various combinations (permutations are not important). Something like

OwnerId, ItemId

1,A

1,C

2,A

2,B

2,C

and so on.

I need to know which top 10 items account for most number of people?

Thanks.

Ulf Dittmer

Rancher

Posts: 42968

73

David Soloman

Greenhorn

Posts: 2

posted 10 years ago

The approach I have in mind is:

1. Determine combinations in data with a maximum of 10 items (ignore >10 combinations) and their popularity (#owners)

2. Pick the first combination - determine #owners for all sub-combinations existing in data

3. Choose up to 5 items starting from most popular to least popular combinations

The problem: Although not bruce force, I feel this is still requires a large computation. What do you think?

1. Determine combinations in data with a maximum of 10 items (ignore >10 combinations) and their popularity (#owners)

2. Pick the first combination - determine #owners for all sub-combinations existing in data

3. Choose up to 5 items starting from most popular to least popular combinations

The problem: Although not bruce force, I feel this is still requires a large computation. What do you think?

posted 10 years ago

One excellent problem solving technique (from Polya's How To Solve It) is to simplify the problem.

How would you solve this if you only had to pick the top ONE item? possibly even from a group of 10 people?

get a good clean method for doing that, then see if you can expand it to more people/items.

[ November 25, 2005: Message edited by: fred rosenberger ]

How would you solve this if you only had to pick the top ONE item? possibly even from a group of 10 people?

get a good clean method for doing that, then see if you can expand it to more people/items.

[ November 25, 2005: Message edited by: fred rosenberger ]

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors

posted 10 years ago

I would just dump those ownership pairs (less than 200,000 of them) into a database and do

SELECT ItemID, Count(*) as number FROM table GROUP BY ItemID ORDER BY number desc

Sure, it's a "big" problem but the database should be able to do that query in a couple of seconds.

SELECT ItemID, Count(*) as number FROM table GROUP BY ItemID ORDER BY number desc

Sure, it's a "big" problem but the database should be able to do that query in a couple of seconds.

Virag Saksena

Ranch Hand

Posts: 71

posted 10 years ago

Create a hash map/table for the owned elements.

for every input pair

probe the hash table with key = item

if the key exists, get the value, increment it, and put it back

if the key does not exist, create a value 1 and put it in the hash table

So with n put/get operations, you will have a list of your counts

Then enumerate the hash table, put the values in a array and sort them

In the worst case you go to O(n log n). If the number of items is much smaller than number of entries, you essentially will get the result in O(n)

for every input pair

probe the hash table with key = item

if the key exists, get the value, increment it, and put it back

if the key does not exist, create a value 1 and put it in the hash table

So with n put/get operations, you will have a list of your counts

Then enumerate the hash table, put the values in a array and sort them

In the worst case you go to O(n log n). If the number of items is much smaller than number of entries, you essentially will get the result in O(n)

<a href="http://www.auptyma.com" target="_blank" rel="nofollow">The Peak of Performance</a>