• Post Reply Bookmark Topic Watch Topic
  • New Topic

Find Top X Algorithm  RSS feed

 
David Soloman
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all

I have this interesting problem. Assume 20 items (A,B,C,D etc) are owned by a group of 10,000 people in various combinations (permutations are not important). Something like

OwnerId, ItemId
1,A
1,C
2,A
2,B
2,C
and so on.

I need to know which top 10 items account for most number of people?

Thanks.
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So you write an algorithm to compute the answer, which is pretty straightforward. What exactly is your question? Which ways of solving it have you thought of so far?
 
David Soloman
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The approach I have in mind is:

1. Determine combinations in data with a maximum of 10 items (ignore >10 combinations) and their popularity (#owners)
2. Pick the first combination - determine #owners for all sub-combinations existing in data
3. Choose up to 5 items starting from most popular to least popular combinations

The problem: Although not bruce force, I feel this is still requires a large computation. What do you think?
 
fred rosenberger
lowercase baba
Bartender
Posts: 12442
42
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One excellent problem solving technique (from Polya's How To Solve It) is to simplify the problem.

How would you solve this if you only had to pick the top ONE item? possibly even from a group of 10 people?

get a good clean method for doing that, then see if you can expand it to more people/items.
[ November 25, 2005: Message edited by: fred rosenberger ]
 
Paul Clapham
Sheriff
Posts: 22185
38
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would just dump those ownership pairs (less than 200,000 of them) into a database and do

SELECT ItemID, Count(*) as number FROM table GROUP BY ItemID ORDER BY number desc

Sure, it's a "big" problem but the database should be able to do that query in a couple of seconds.
 
Virag Saksena
Ranch Hand
Posts: 71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Create a hash map/table for the owned elements.
for every input pair
probe the hash table with key = item
if the key exists, get the value, increment it, and put it back
if the key does not exist, create a value 1 and put it in the hash table
So with n put/get operations, you will have a list of your counts
Then enumerate the hash table, put the values in a array and sort them

In the worst case you go to O(n log n). If the number of items is much smaller than number of entries, you essentially will get the result in O(n)
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!