• Post Reply Bookmark Topic Watch Topic
  • New Topic

Finding Most Common Phrase Occurance In String?  RSS feed

 
Justin Filmer
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey guys, this is an interesting problem.
Let's say I have a string like this:


I want to be able to pick out the two or three word phrase that occurs both most often and second-to-most often in the string, while ignoring common words such as "I","and","is", etc. In the example string I provided, the most common phrase returned by the method should be "coding algorithms" and the second-most common phrase returned should be "love writing code".

Any ideas / code samples on how to do this? I'm thinking first, remove the common words, then use some type of dictionary that keeps track of relative percentages for all consecutive phrases. Then pick the highest two percentages from the dictionary. Now, how can we actually turn that into Java code?
 
Justin Filmer
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Ulf Dittmer for the fixing of my String! Anyone have any ideas for the alogrithm/coding aspect of the problem?
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you'd SearchFirst you'd find a few similar threads. In the last one I encountered I suggested separating the problem into two sub-problems. In your case that would be three:
1) get a count for the number of words
2) filter out some words (I, is, etc)
3) sort the remainder

1) is usually done by using a Map<String,Integer>, where the keys are the words and the values are the occurrences. Use a TreeMap for to ignore the case of the words.
2) can be done by having a Collection<String> (or Set<String>) with too-common words, then removing those from the map (map.keySet().removeAll(commonWords)).
3) can be done by adding all the Map.Entry objects into a List that you then sort using Collections.sort and a custom Comparator.

After those steps you can use the List to access the entries in the right order.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!