You have been given a large list of people’s names and have been asked to highlight all duplicates where the exact same name appears more than once, as well as highlight any possible duplicates where there may be misspellings of the same name. Describe an algorithm you could use that would produce the required output.
The first part is easy - I could easily find duplicates using a nested for loop but how would find duplicates where there MAY be misspellings??? Do you create hash values for the names and compare hash values - if the hash values are close (not sure how you check that) then there may be similarly but misspelled names?
I've done a bit of reading after checking out this post, and found out about the Damerau-Levenshtein distance which I believe would be better suited.
Besides the three operations used in computing the Levenshtein distance - insertion, deletion and substitution - this one comes up with a fourth, the transposition of two adjacent characters, particularly useful in case of misspellings.