Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

any good algorithms for a spellchecker?  RSS feed

 
Max Huang
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, can anyone recommend a very fast but SIMPLE algorithm for a spellchecker?
 
David Weitzman
Ranch Hand
Posts: 1365
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since you say 'simple', I'm guessing you want to use a spellchecker but don't want to have to write something complex. If that's the case, check out Jazzy.
You can do some algorithmic stuff, but people tend to get good results with techniques that try to determine how a word sounds. One system for turning letters into sounds is the Metaphone algorithm. One that used to be popular was called Soundex. If you perform a few mutations on a word (deleting characters, switching them, etc.) and look them all up using a sound-based algorithm, you usually get pretty reasonable results.
[ February 23, 2003: Message edited by: David Weitzman ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well if you want to code something yourself, the first thing I'd try is loading all the valid English words you can find into a HashSet. Then to check text, use a StringTokenizer (or StreamTokenizer or maybe String's split() method) to break the text into separate words, and check each word to see if it's in the HashSet. If not, flag that word as a possible error. This allows you to detect errors, but not to suggest reasonable alternatives (which is what some of David's links above will get into, if you need that.) You may also want to develop special procedures for handling capitalization - the simplest is to not worry about it, and just convert everything to lower case.
Essentially, use my advice if this is a learning project for your own benefit and you're just beginning; use David's advice if you're more advanced, or need the software to be higher-quality, or you just want to use spell-checking, without coding it yourself.
[ February 23, 2003: Message edited by: Jim Yingst ]
 
Max Huang
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all, thanks for your suggestions.
Actually Jim I am interested in methods for spell checking suggestions, not simply spell checking.
I am just curious about nice algorithms for it. I suppose you could come up with something optimized if you knew the domain. For example, Weitzman's word-sounding idea is fascinating. If you guys know of any other domain-specific spell-checking-like suggestion type methods, please share!
So it's not that I don't want to hear complex ideas I just want to understnad the methods involved. I glanced at Jazzy but I guess I'll have to study it further because I still don't know what methods they used for the spellchecking suggestions.
Soundex, eh. Deleting and switching characters... so I guess that is optimized for typical typing errors, where people tend to inverse characters and leave them out.
 
David Weitzman
Ranch Hand
Posts: 1365
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just a few more possiblilities, although they'll probably give less useful suggestions than sound-based algorithms:
You can locate other words that have a small minimum edit distance (also known as Levenshtein distance) from whatever word you're looking up. Google gives some decent links.
There's a JavaWorld article that shows how to check spelling with a ternary search tree.
 
Max Huang
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's cool. Thanks for the link and suggestions. I wonder what "ispell" uses.
Originally posted by David Weitzman:
Just a few more possiblilities, although they'll probably give less useful suggestions than sound-based algorithms:
You can locate other words that have a small minimum edit distance (also known as Levenshtein distance) from whatever word you're looking up. Google gives some decent links.
There's a JavaWorld article that shows how to check spelling with a ternary search tree.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!