..but logically the pattern "\\b[a-zA-Z]+\\.?\\b" should match all words one "with one period" No, because after the period, it looks for a word boundary \b, and it doesn't (usually) find one because the word
already ended before the '.'.
and those words "without a period" because everyone know that a metacharacter "?" is an optional which means the pattern above may have match one period or nothing at all. It also mathes words with two periods, because it can simply ignore the period (the ? means it's not required to take it) and it's still at the word boundary, so the final \b matches.
Does this means that the Java Regular Expression engine have a bug?. Dunno if there are other bugs, but this isn't one - it's a problem with your pattern.
The classic reference for learning about regexes is
Mastering Regular Expressions by Jeffrey Friedl. Highly recommended Also Max Habibi (bartender here at the ranch) hasn the upcoming
Real World Regular Expressions with Java 1.4 which will be worth checking out, focusing more specifically on Java's java.util.regex package. Also useful: if you use Eclipse, try the
RegEx Tester plug-in. Or for mroe traditional regexes (no possessive queitifiers) you can use the
Regex Coach. There are probably others; these are the ones I've tried.
[ October 10, 2003: Message edited by: Jim Yingst ]