• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Simple Regex

 
Tim Eapen
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello everybody:

Here is some simple code:



Here is the output:



I can understand this. I don't understand the following.

Now I will change the pattern object so that it is a reluctant match as follows: Pattern reluctant = Pattern.compile("\\d*?")

When I do a match on the same input I get the following output:



It seems as if the pattern is matching the empty string on every character it encounters for reluctant matching. Why? This doesn't seem intuitive to me.

Tim
 
Franz Fountain
Ranch Hand
Posts: 58
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My guess is that the reluctant qualifier is finding the shortest string that will match the pattern. That is an empty string since "\\d*?" can match 0 or more characters. It seems that no matter what the input string, "\\d*?" will always match on every empty string. I guess the *? qualifier only makes sense when it is followed by something. For example "\\d*?6" in this example would make sense.

This is a good question. I hope someone with more experience with regex will give a more definitive answer.
 
Henry Wong
author
Marshal
Pie
Posts: 21496
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It seems as if the pattern is matching the empty string on every character it encounters for reluctant matching. Why? This doesn't seem intuitive to me.


As already mentioned, the reluctant pattern you created will match a blank string (zero length match). What doesn't seem intuitive is what happens afterwards. To understand that, you need to understand what the find() method does -- here is the relevant quote from JavaDoc.

This method starts at the beginning of the input sequence or, if a previous invocation of the method was successful and the matcher has not since been reset, at the first character not matched by the previous match.


Normally, the find() method will start the search for the next match at the end of the previous match. The exception is the zero length match. In that case, it will start at the next character. This is why it is matching "on every character it encounters".

Henry
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic