Let's say I have two Strings...
What I would like to do is detect what percentage of words match between the two... I would LOVE to do sequence detection (
http://59.108.48.12/proceedings/sigir/sigir2010/docs/p675.pdf), but I think that's too difficult.
This method is severely flawed, but at least it'll give me a start. Let's first find the words that are similar between the two:
11 words match
s1 has 18 words
s2 has 15 words
Use smaller of the two...
% match = 11/15 = 74% match
Remove common words (my,is,I,to,and). Resulting words...
name, Justin, like, code, have, fun
% match = 6/15 = 40% match
Is there already a
Java function that could give me some percentage of partial duplicates? If not, how can I write an algorithm that can compare two strings and figure out what percentage they have in common?
Here's my pseudocodo with the approach that I don't like...
I would really appreciate better ideas or code snippets... Thanks!