programming forums Java Java JSRs Mobile Certification Databases Caching Books Engineering OS Languages Paradigms IDEs Build Tools Frameworks Products This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
Sheriffs:
Saloon Keepers:
Bartenders:

# Help me understand Regex quantifiers

Oceana Wickramasinghe
Ranch Hand
Posts: 77
Hello guys, since days i've been struggling to figure out how quantifiers work. I used several sources including Kathy Sierra's book but none of them explain quantifiers in a way that i can understand. There are several things i dont understand about quantifiers, mainly it has to do with the way the clarifications are worded.

For example * means zero or more occurrences.What does "zero or more occurrences" exactly mean? Can someone simplify this statement for me.

Secondly, i want to know exactly how the mechanism works. If i were to match the string "nnnnnnuuuuuuuuulllll" with the pattern "nu" followed by each quantifier, i would get

"nuuuuuuuuu" with +
"nnnnnnuuuuuuuuu" with *
and
"nnnnnnu" with ?

What exactly happens in the background when i execute this? I want someone to explain step by step why each quantifier behave the way they do. Why am i getting multiple "n"s and "u"s when what i want it to search for is "nu"? Its like quantifiers break down the pattern and treat each character as a different pattern.

Jeff Verdegan
Bartender
Posts: 6109
6
Oceana Wickramasinghe wrote:Hello guys, since days i've been struggling to figure out how quantifiers work. I used several sources including Kathy Sierra's book but none of them explain quantifiers in a way that i can understand. There are several things i dont understand about quantifiers, mainly it has to do with the way the clarifications are worded.

For example * means zero or more occurrences.What does "zero or more occurrences" exactly mean? Can someone simplify this statement for me.

It's hard to make that any simpler or clearer. "Zero or more" means "zero or more". That is, ">= 0 instances of the thing we're matching." So, if we have "X*", that means no Xes at all, or "X", or "XXXXXXXXXX" will all match.

Secondly, i want to know exactly how the mechanism works.

That's a combination of two things:

1. Implementation dependent stuff.
2. Stuff that is way beyond what can be explained in a forum.

The source code for all the Java core API classes is available in the src.zip file that comes with the JDK download. You can look there. Or google for something like "regular expression specification".

What exactly happens in the background when i execute this? I want someone to explain step by step why each quantifier behave the way they do.

Roughly--VERY roughly--it looks at each character and asks, "can what I've matched so far, plus this next character, match the current part of the regex. If not, then if I back up a character, can what I've matched so far match the current part of the regex while the next character matches the next part of the regex."

Darryl Burke
Bartender
Posts: 5167
11
I've found this tutorial very useful for gaining some understanding of Regex syntax: http://www.regular-expressions.info/