• Post Reply Bookmark Topic Watch Topic
  • New Topic

representation problem  RSS feed

 
Puneet N Vyas
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
74731 74748 74783 109554 1
74731 74747 105488 13 30 65 34836 37601
74731 74736 92326 13 29 30770 37613 37630 37665 71994
74731 74756 74957 13 18 17608 37613 37629 67992
74731 74760 74782 95201 13 38 239 37613 37618 55002
74725 74774 78147 13 42 64 20483 37613 37638 37839
74731 74736 111385 7 56 3429 37613 37642 37664 57829
74723 74744 74867 13 18 36667 37607 37656 40995

i have to implement a data mining algorithm which mines frequent subsequence as pattern from data shown above.,the above data is a typical sequence database,each row represents a unique sequence..
each sequence consists of list of item set(called transactions),for example
first sequence contains 5 transaction,first transaction contains itemset={7,4,7,3,1} and so on..
now i need to count the number of times 7 is occuring in the first row ,2nd row and so on,the support count of a item say 7 is defined as the number of
row tht contains it,for example support count of 7 is 8,i.e if an item occurs in multiple transaction in a sequence like in first sequence,it accounts for only one in it's support..,now how do i map this in java,should i use collections,or what if any one can suggest

thanks for support
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you familiar with Maps? This seems like a strange question for the Advanced forum. A Collection won't help you here, but a Map would work. Use a Character or Integer as key ('7' or 7), and an Integer as the value, representing the service count. To update a service count you would have to get() the value, increment it, and then put() the new value into the map.

You could also use an int[] array, where the index of the array is the numeric value of the item. So the service count for 0 would be serviceCount[0], and the service count for 7 would be serviceCount[7]. Currently it looks like you have no more than ten different items (0-9), so the size of the array would be 10. I think the array is both simpler to use, and faster.
[ March 09, 2008: Message edited by: Mike Simmons ]
 
Nicholas Jordan
Ranch Hand
Posts: 1282
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I like to use maps for everything, I just call random.nextInt() for a key then use it like a list if I have to but the posters question looks like a List of OrderedLists or OrderedList of OrderedList. I will be glad to look it up for the poster to make suggestions, but this resembles a database and there is no universal database class in Java, they always go to an external database implementation. JDBC or someting...

Poster specifies: each row represents a unique sequence.. so sounds like set of list. For collection to have each value unique, that is Set, then there is an ordered sequence, there are no keys so in Java Collections parlance, that is a Set of Lists.
[ March 09, 2008: Message edited by: Nicholas Jordan ]
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Nicholas Jordan:
Poster specifies: each row represents a unique sequence.. so sounds like set of list. For collection to have each value unique, that is Set, then there is an ordered sequence, there are no keys so in Java Collections parlance, that is a Set of Lists.


I agree that a Set of Lists might be useful at some point in this problem, but if you go on to look carefully at the poster's definition of "support count", he needs to count how many lines contain a 0, how many lines contain a 1, etc. Using a Set of Lists may give you a place to store the data (if you need it - I would just process one line at a time), but it doesn't help with the counting. That's what the Map or array is for: to store the count for each element. There are other ways to do it, but this seems pretty simple and fast.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!