Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!

# regex

Komal Arora
Ranch Hand
Posts: 91
I am not able to understand the problems on regex(greedy quantifiers). for instance, consider the following problem:

how does it produce the output 1 b345 f0

Kevin Workman
Ranch Hand
Posts: 151
Komal Arora wrote:how does it produce the output 1 b345 f0

What did you expect it to produce?

Komal Arora
Ranch Hand
Posts: 91
I did not understand how the digit 5 came in the answer. In the book it is writen that greedy quantifiers scan the entire source data and then they move backwards finding the appropriate match. I always get confused in how it does that !

Komal Arora
Ranch Hand
Posts: 91
oh no wait, got it

Dammit!, lack of concentration!

Kevin Workman
Ranch Hand
Posts: 151
Komal Arora wrote:oh no wait, got it

Cool. You might want to offer what you figured out, in case anybody else has a similar problem. Your post might come up on a google search that somebody else finds in the future.

Komal Arora
Ranch Hand
Posts: 91
we need to find a match according to the pattern [a-f]\d+ i.e, an alphabet ranging from a to f , then one or more(+ quantifier) digits in a row.
one such match is found at position 1 (b34) and the next at position 5(f0) , and hence the output 1b34 5f0

Jelle Klap
Bartender
Posts: 1952
7
• 1
The pattern will match a sequence of characters which consists of exactly one occurence of characters a, b, c, d, e or f, followed by at least one digit.
Now, given the String ab34ef0, which sub-sequences match this pattern?

Let's start at index 0 and work our way thru the sequence:

0 - No match here, a is a matching character, but it should be followed by one or more digits, and b certainly isn't that.

1 - Found a match! b is a matching cahracter, followed by 3 which is a digit! So are we done with this match? Not quite, because the greedy + quantifier will try to match as much of the sequence as it can, and the next character in sequence is 4, which is also a digit. Now we're done with this match, because the next character in sequence is e, which would break the pattern. Right, so now we print the starting position of this match (Matcher.start()) 1 and the match itself (Matcher.group()) b34 to the console, separated by a single white space, and we don't add a line separator, because we made a call to print(), not println().

4 - Wait a minute, isn't 1 ususally followed by 2? Well yes, but the previous match has 'consumed' the sequence up to and including the index position where that match ended: 3. Ok, so starting at position 4 we find e followed by f which doesn't match the pattern.

5 - Found another one! The sub-sequence f0 matches the pattern quite nicely. So lets print that to the console as well: 5 f0. Now, because we're using print() instead of println the output will be appended directly to the previous output.

Now we're done - the entire sequence has been consumed - and the output reads 1 b345 f0.
And that's the way the cookie crumbles

Edit: Oh crud, typing up my reply took longer than I thought, and the question has been answered in the mean time. Oh well...

Komal Arora
Ranch Hand
Posts: 91
Wow, that was a very nice explaination

i was just going through regex questions where i found this one:

and the command line:

java Regex2 "\d*" ab34ef

to this the output is : 01234456
Now how did that come?

Jelle Klap
Bartender
Posts: 1952
7
Well, this has to do with zero-length matching, as described in this tutorial.
I suggest you read that first, and then come back to give this one a try yourself

Komal Arora
Ranch Hand
Posts: 91
• 1
OKAY. so * quantifier says "zero or more occurences"
our string is ab34ef

at index 0 - there is a zero occurence, hence 0 is printed.

at index 1 - there is again a zero occurrence and hence 1 is printed

at index 2 - there is an occurrence of the group 34, hence 234 is printed

at index 4 - zero occurrence, we print 4

at index 5 - zero occurrence, we print 5.

Where did 6 come from then?

Jelle Klap
Bartender
Posts: 1952
7
That would be the zero-length match after the last character in the sequence
Have a look.

Neha Daga
Ranch Hand
Posts: 504
Read the chapter in K&B carefully it says that a quantifier will check it for the position after the last character in the string. That is the position next to last character and in this case it matches the pattern to me matched which is 0 or more .

well I am late

Komal Arora
Ranch Hand
Posts: 91
hey finally got the problem Thanks Jelle!

And Neha, does this hold for all the three quantifiers? do ALL of them look at the position after the last character of the string?

Jelle Klap
Bartender
Posts: 1952
7
Let's answer that with a question: is a zero-length match a possibility for all three quantifiers?

Komal Arora
Ranch Hand
Posts: 91
Nope, only for * and ?
So that means only * and ? will look to the position after the string?

Jelle Klap
Bartender
Posts: 1952
7
Bingo.

Komal Arora
Ranch Hand
Posts: 91
YAY

You are a good teacher