Win a copy of The Way of the Web Tester: A Beginner's Guide to Automating Tests this week in the Testing forum!

# searching for "\d*" in a string

J Brewer
Ranch Hand
Posts: 46
In the following question:

It says that the answer is E, but I get: 0123445. I don't understand where the '6' comes in?

wise owen
Ranch Hand
Posts: 2023

J Brewer
Ranch Hand
Posts: 46
Thanks! I saw this in the K&B book, and on a mock exam, and I thought it was a mistake...

J Brewer
Ranch Hand
Posts: 46
but of course knew it was much more likely that I was mistaken.

Vijay Raj
Ranch Hand
Posts: 110
Its a greedy quantifier and that's why a zero length search was applied at the end of the string giving a 6 at the end. Fine. Why wasn't the zero length search applied at the beginning of the string which would have resulted "001234456".

regards,
vijay.

Henry Wong
author
Marshal
Posts: 21735
85
Originally posted by Vijay Raj:
Its a greedy quantifier and that's why a zero length search was applied at the end of the string giving a 6 at the end. Fine. Why wasn't the zero length search applied at the beginning of the string which would have resulted "001234456".

regards,
vijay.

The zero length result was applied at the beginning of the string, which is why the first value is zero. Are you asking whether the beginning of the string should be applied twice?

Henry

Vijay Raj
Ranch Hand
Posts: 110
The zero we got in the answer is because of 'a', the first character in the input string, right.
a - prints 0
b - prints 1
3 - prints 234 (2 being m.start() and 34 being m.group())
4 - prints nothing because its already been visited
e - prints 4
f - prints 5
Atlast, prints 6 where the zero length search is performed. Its because f lies between index 5 and index 6. Since the while loop will go till the end, that is, till the length of the string, it performs a zero lemgth match. Am I right till here? I just need to confirm whether I am going in the right direction or not.

If yes, then why is there not a zero length match at the beginning, that is, at index 0.

regards,
vijay.

Henry Wong
author
Marshal
Posts: 21735
85
The zero we got in the answer is because of 'a', the first character in the input string, right.

No... "a" does not match the regular expression -- neither does "b", "e", or "f". If "a" did match the regular expression, then the output would have been "0a", instead of "0".

Henry

Vijay Raj
Ranch Hand
Posts: 110
The regex engine goes to check the character between index 0 and index 1, finds zero or more '\d's there. Therefore, return the m.start() as 0 and returns "" as m.group() because it found no '\d'. Similarly, it goes to check the character between index 1 and index 2 and so on. After checking out the character in between index 5 and index 6, it goes to index 6 to do a zero length match.

Now, what I wanted to ask was that why didn't it do a zero length match in the beginning, at index 0.

regards,
vijay.

Henry Wong
author
Marshal
Posts: 21735
85
Now, what I wanted to ask was that why didn't it do a zero length match in the beginning, at index 0.

Here is the breakdown of the results

0 - zero length match before the first character -- at index 0
1 - zero length match after the previous match -- at index 1
234 - A match of "34" at index 2
4 - zero length match after the previous match -- at index 4
5 - zero length match after the previous match -- at index 5
6 - zero length match after the previous match -- at index 6

Henry