• Post Reply Bookmark Topic Watch Topic
  • New Topic

Compare similar String  RSS feed

 
kc pradeep
Greenhorn
Posts: 29
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I want to write a function which will should return the measure of similarity between two string passed to it.

Say String A and String B is passed to the function, then it should return the value, from 0 to 100. 0 being totally different string and 100 being exactly same string.

The similarity should take into consideration the order of the words.

Is there any library already for this?

Please help

 
Campbell Ritchie
Marshal
Posts: 56546
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will have to work out what the algorithm you wish to use is. Also what does "similarity" mean? If you mean the same letter in the same place, then what about Ritchie and CRitchie? They would come out as 0 even though the latter String contains the former String as a substring.
 
kc pradeep
Greenhorn
Posts: 29
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I don't want to compare the exact order, but the order should have some weightage on calculating how similar the two strings are.


Similar String: Most of the words of the string are same and order of words are same except for few (say max 40% of words)
 
Jacek Garlinski
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
you should search for something like 'text processing algorithms'
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One approach would be the Double Metaphone algorithm (as implemented in the Apache Commons Codec library, for example). Another is the Levenshtein distance or Damerau-Levenshtein distance.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!