Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regex Find, Start and Gruop

 
Joshua Smith
Ranch Hand
Posts: 193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All-

I was working through some of Mock questions provided by Kathy Sierra and Bert Bates and had a question about regular expressions. Maybe someone here can clear things up for me.

It's question #7 from the mock questions that they posted to the list.

I'm modified the code slightly so it's clearer as to what output is coming from what method. My version of their code is as follows:



The output is:

0:[]
1:[]
2:[34]
4:[]
5:[]
6:[]


As I understand what's happening, the find() method is walking the String ab34ef from left to right, looking for matches. If it finds one, then it's available via the group() method. If it doesn't find one, then you get a zero-length String from the group method.
For position 0 it finds "a" (which doesn't match so we get "0:[]".
For position 1 it finds "b" (which doesn't match) so we get "1:[]".
For position 2 it finds "34" (a match) and so we get "2:[34]".
Position 3 is gobbled up by the match, so we don't get output for it.
For position 4 it finds "e" (which doesn't match) so we get "4:[]".
For position 5 it finds "f" (which doesn't match) so we get "5:[]".
For position 6 it finds???

That's my question. I'm not sure why we have a 6th find. Is there some sort of implied String terminator ($ in Perl regex speak) that it's finding?

Any ideas?
Thanks,
Josh
 
Barry Gaunt
Ranch Hand
Posts: 7729
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'll take a guessful stab at this. If you look at the API for Matcher, the group() method, it says:
Note that some patterns, for example a*, match the empty string. This method will return the empty string when the pattern successfully matches the empty string in the input.


So when the find is beginning at character position 6, there is only the empty string left. Because the pattern is "\\d*", this must match (because * means zero or more of the preceding "\\d").

Does that make sense? If so, convince me.
 
Ryan Kade
Ranch Hand
Posts: 69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think that's exactly right, Barry. The Java tutorial on the topic says:


A zero-length match can occur in a several cases: in an empty input string, at the beginning of an input string, after the last character of an input string, or in between any two characters of an input string.


Convincing?

http://java.sun.com/docs/books/tutorial/extra/regex/quant.html
 
Bert Bates
author
Sheriff
Posts: 8900
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
Joshua Smith
Ranch Hand
Posts: 193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Clever Barry. :-)

And thanks for the confirmation Ryan and Bert.

A co-worker and I puzzled over that one a bit and were leaning towards a zero-length string, a null string, some sort of invisible terminator etc. It's just nice to see in writing what is actually happening. Helps to "convince me" too. :-)

Josh
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic