I understand how they get to 01234. 2 is the third index of the number 34 which is returned by group(), but where on earth does the 456 come from? Especially the number 6 if the source ab34ef, which is zero-based, has only a top index of 5? It's driving me crazy. Any ideas?
Hoping that this is more than a curiosity for you:
I once used a proprietary language (for telecom real time software) in which strings were coded
as lenght (the number of characters), followed by the characters themselves. The length is accessible
as an index too, so that leaves you with a string that has more indexes than characters...
I don't know if that's the way Java handles things...
Every good tree bears good fruit
posted 9 months ago
Thanks, Mano. It's not a mere curiosity for me. I'm studying for the OCP 7 exam and this threw me. I thought the "hidden" characters at the beginning and end of String only applied to Regex \b and \B. It's still confusing, but I'll sort it out eventually. Strange concept.
The pattern \\d* means a string length 0 or more, consisting only of integers. This pattern will match at every index of the input string because at every index there is a String of integers of at least 0 length, except at index 2, where it finds 34.
6 is indeed not a valid index in the given String (because its length is only 6, so indexing will be from 0 to 5), however, "beginning" and "end" are independent concepts in pattern matching that do not necessarily match with String index as this example shows. Last character of the input String is indeed at index 5 but from the perspective of the matcher, the string "ends" at position 6. You can think of it as a String terminator that exists just after the last character of the string.
The given pattern does match with the string terminator as well (because of the star) and so the matcher returns 6 as well.
Change the pattern from "\\d*" to "$" and you will see that it prints 6. Because $ matches "end of input" and 6 is where the input ends.