as is my nature, before reading more about them, i post a question. this is just because i don't get a lot of internet time.
ok i need, mainly, how to match whole words only. this might seem easy. just check if there is a space before and a space after. first of all i don't know how to do that, and worst of all what about the cases where the match is at the very beginning or very end of the string?
there are many ways to find i have learned: indexOf(), contains(), match(); the classes Pattern and Matcher. it looks like regex is the way to go.
Depending on the regex implementation, you may have the character class \w (Word) available. So "\w+" matches whole words. As Fred points out though, it depends how you define a word. I don't think that pattern matches contractions, like "don't".
It's a good idea to have a go-to site for testing regular expression, because they're tricky to get right. Mine is rubular.com, but there are many others.
"\b" represents a word boundary (see the Javadoc for Pattern) but there are a couple of 'gotchas'. First, it does not handle contractions and second you will need to escape it i.e. use "\\b" . As always with regular expressions the devil is in the detail but you have not yet provided any.
thanks for the answers. i didn't know about a go-to site for testing. i still need to read about regex more, i only glanced at it so far. if you have microsoft's wordpad, i am recreating the find and replace dialogs. checkboxes for Match Case and Whole Word Only. i also realized a word can end in punctuation(.,;
i see what you mean by escaping. the :+the) got interpreted as a smily face
i have part of it already. Replace All: Match Case was easy i just used the String method replaceAll() using the contents of the two text fields.
I think you must know what text you analyze and if there are some locale-specific symbols you cannot use smthing like [a-zA-Z] where latin symbols are used only, but java has smth like this:
which is for: unicode property "letter" - for all languages OR (certain symbol - you can define with your own. But it must be between word boundaries - \b) - OR (symbol) ' so we can catch all letters, which can be like O'Reilly or well-done
Randall Twede wrote:actually \b does seem to handle contractions.
Interesting since my simple example shows that it doesn't ! The code below produces an array containing all the words but with all the apostrophes separated from their adjoining words. If the contractions were being handled as beiing part of the word then they would be included in the word in the split. Am I missing something?
Randall Twede wrote:
maybe it is because i have java 7
Sorry but that is not the reason. It has been the same since the regex package was introduced into Java and, though I cannot claim to have tried all regex implementaions, I have never met a regex flavour in any language that is different.
i see the problem now. if i type
can can't candy
then i say repace all can with dan, whole word only
dan dan't candy
clearly a problem, but when i tried it in Wordpad and Open Office i got the same results
so i guess i won't worry about it
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com