• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Matcher.find() -> looks one past end of String?

 
Richard Parker
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I have a question about the Chapter 6 Self Test question #1 in the K&B book.
Here is the question from the book:

-------------------------------------------
Given:

import java.util.regex.*;
class Regex2 {
public static void main(String[] args)
{
String pattern = "\\d*";
String source = "ab34ef";

Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);

boolean b = false;
while( b = m.find() )
{
System.out.println( m.start() + m.group() );
}
}
}

And the commend line:

java Regex2 "\d*" ab34ef

-------------------------------------------
Correct Answer: 01234456
-------------------------------------------

My question is: does Matcher.find() always look one position past the end of the source String? In this example I would have thought the answer to be:

0123445

because there are no more characters after position 5.

-
Any thoughts on this will be greatly appreciated.
Thanks in advance,

Richard
 
Jesse Custer
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
SCJP FAQ Page
 
Richard Parker
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sweet!
Thanks for the link (I'll probably be referring to this often.)

So:
"The asterisk (*) is a "greedy quantifier," specifying that whatever preceeds it (in this case, any digit) should be matched zero or more times. By allowing for zero occurrances, a match of zero length is possible. Because a match of zero length is possible, the find() method will check the index following the last character of input."

A match of zero length for greedy quantifiers seems weird.
Something to definitely keep in mind.

Thank you!
 
Jesse Custer
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your welcome,

by the way, regex is something that took me quite some time to understand. It doesn't always behave the way you think it will.

For example if you take the same code you gave but change the pattern to "\\d*?". What do you think the result will be?
 
Javier Sanchez Cerrillo
Ranch Hand
Posts: 152
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For example if you take the same code you gave but change the pattern to "\\d*?". What do you think the result will be?


Reluctant quantifiers are not covered in the exam. Neither Possessive Quantifiers are covered.
 
Bijendra S. Rajput
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jesse,

I am really surprized why seeing the output of this program with

java Regex2 "\d*?" ab34df

m.group() is not printing anything.......confused......

can you help me please...
 
Bijendra S. Rajput
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
sorry I forgot to write the o/p

0123456
 
Jesse Custer
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just checked it, and Javier is right. Only greedy quantifiers are on the exam. But reluctant quantifiers are in the K&B book so it's not totally irrelevant.

What you probably expected was that m.group() would print '3' at position 2 and '4' at postition 3 wich would give the output: 012334456

The expression "\\d*?" is searching for 0, 1 or more occurences of digits, and since it's a reluctant quantifier it will give back AS LITTLE AS POSSIBLE. At position 2 it comes across '3' wich is a digit. Now instead of returning this digit, it in fact returns 0 digits because that is the smallest value it can return while still following the expression.
So the output is indeed: 0123456

I hope this explanation makes it clearer, because I find it hard to explain.
Try playing with the next piece of code if it's still unclear to you about what the regex returns.


Greetings
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic