programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
• Campbell Ritchie
• Tim Cooke
• Devaka Cooray
• Ron McLeod
• Jeanne Boyarsky
Sheriffs:
• Liutauras Vilda
• paul wheaton
• Junilu Lacar
Saloon Keepers:
• Tim Moores
• Stephan van Hulst
• Piet Souris
• Carey Brown
• Tim Holloway
Bartenders:
• Martijn Verburg
• Frits Walraven
• Himai Minh

# Not sure if there is some mistake in SCJP6 for regex

Ranch Hand
Posts: 1087
• Number of slices to send:
Optional 'thank-you' note:
I came across the below description.Somehow i felt its wrong with reference to occurrnces at index 11

0[xX][0-9a-fA-F] The preceding expression could be stated: "Find a set of characters in which the first character is a "0", the second character is either an "x" or an "X", and the third character is either a digit from "0" to "9", a letter from "a" to "f" or an uppercase letter from "A" to "F" ". Using the preceding expression, and the following data, source: "12 0x 0x12 0Xf 0xg" index: 012345678901234567 regex would return 6 and 11. (Note: 0x and 0xg are not valid hex numbers.) As a second step, let's think about an easier problem. What if we just wanted regex to find occurrences of integers? Integers can be one or more digits long, so it would be great if we could say "one or more" in an expression. There is a set of regex constructs called quantifiers that let us specify concepts such as "one or more." In fact, the quantifier that represents "one or more" is the "+" character. We'll see the others shortly

Bartender
Posts: 10780
71
• Number of slices to send:
Optional 'thank-you' note:

Vishal Hegde wrote:I came across the below description.Somehow i felt its wrong with reference to occurrnces at index 11

And why is that? Assuming the writer is talking about Matcher.find(), it seems perfectly reasonable to me.

Winston

Bartender
Posts: 4568
9
• Number of slices to send:
Optional 'thank-you' note:
Why don't you think there's a match at 11? Looks like one to me.

Vishal Hegde
Ranch Hand
Posts: 1087
• Number of slices to send:
Optional 'thank-you' note:

Matthew Brown wrote:Why don't you think there's a match at 11? Looks like one to me.

Sorry not 11 but the 6th Postiion it said the first value should be 0 , Second should be x or X and 3rd should be either either 0-9 or a-f or A-F but the 6th Index is showing 0x12 , the first two values that is 0 and x are correct but the third value that is 12 how come that is correct? the value should be in range 0-9 only?

Java Cowboy
Posts: 16084
88
• Number of slices to send:
Optional 'thank-you' note:
So? The 1 is in the range 0-9, isn't it? The 2 isn't being looked at at all.

Vishal Hegde
Ranch Hand
Posts: 1087
• Number of slices to send:
Optional 'thank-you' note:

Jesper de Jong wrote:So? The 1 is in the range 0-9, isn't it? The 2 isn't being looked at at all.

Correct me if i am wrong. 0[xX][0-9a-fA-F] represents that first value should be 0 , second value should be either x or X third value should be either 0-9, a-f or A-F

i see 1 and then 2 by that i assume that regex value should be something like 0[xX][0-9][0-9], so as per me this will be correct >> 0x12

Bartender
Posts: 1166
17
• Number of slices to send:
Optional 'thank-you' note:

Vishal Hegde wrote:

Jesper de Jong wrote:So? The 1 is in the range 0-9, isn't it? The 2 isn't being looked at at all.

Correct me if i am wrong. 0[xX][0-9a-fA-F] represents that first value should be 0 , second value should be either x or X third value should be either 0-9, a-f or A-F

i see 1 and then 2 by that i assume that regex value should be something like 0[xX][0-9][0-9], so as per me this will be correct >> 0x12

Matcher.find() has to be used for this to make sense in the first place. Your way forces a minimum of two hex digits character before matching but the original regex requires only one hex digit BUT importantly it does not preclude 2 or 3 or 4 or any number of hex digits other than 1 or zero. If you want to force exactly two hex digits then you need 0[xX][0-9A-Fa-f]{2}[^0-9A-Fa-f] which says match on exactly two hex digits but no more.

Jesper de Jong
Java Cowboy
Posts: 16084
88
• 2
• Number of slices to send:
Optional 'thank-you' note:
The regex 0[xX][0-9a-fA-F] only matches three characters. If you call find() to find matches of this regex in the string "12 0x 0x12 0Xf 0xg", it's going to look where in that string there are characters that match the regex.

It finds a match at position 6, because the characters "0x1" match. It doesn't matter what comes after "0x1". It also finds a match at position 11, because "0Xf" also matches.

Note that the regex matcher specifically does not split the text into tokens separated by spaces, which is what you seem to assume. Whether there is a digit, a space, or some other character after the match, doesn't matter.

Vishal Hegde
Ranch Hand
Posts: 1087
• Number of slices to send:
Optional 'thank-you' note:
Thnks Jesper your post cleared my doubt.
I have few queries though what are tokens in java? and suppose there is a regex [0-9] it will not take a complete number as '22' right? It will be only within a range 0-9?

Jesper de Jong
Java Cowboy
Posts: 16084
88
• Number of slices to send:
Optional 'thank-you' note:
The regex [0-9] matches a single digit which can be 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9. If you use find() on the string "22" with the regex [0-9], it will find two matches, the "2" at position 0 and the "2" at position 1.

A regex such as [0-9], which matches a single character, is not going to find "22", which is two characters, as a match.

The word "tokens" as I used it doesn't have a special meaning in Java in general.

lowercase baba
Posts: 13082
67
• Number of slices to send:
Optional 'thank-you' note:
to further elaborate, you can think of it as starting at each and every position in the target string, and then applying the pattern to see if it matches. so, with a regex of 0[xX][0-9a-fA-F], and a target string of "12 0x 0x12 0Xf 0xg", you apply the regex 18 times.

start at position 0 - the character '1'. Does the pattern match if I start here? '1' does not match '0', so don't return this.
start at position 1 - the character '2'. Does the pattern match if I start here? '2' does not match '0', so don't return this.
start at position 2 - the character ' '. Does the pattern match if I start here? ' ' does not match '0', so don't return this.
start at position 3 - the character '0'. Does the pattern match if I start here? '0' does match '0'. Does 'x' match [xX] (x or X)? Yes. Does ' ' match [0-9a-fA-F]? no.
etc...
start at position 6 - the character '0'. Does the pattern match if I start here? '0' does match '0'. Does 'x' match [xX] (x or X)? Yes. Does '1' match [0-9a-fA-F]? Yes. Since that is the end of the pattern, I should return position '6' as a match.
start at position 7 - the character 'x'. Does the pattern match if I start here? 'x' does not match '0', so don't return this.
etc.
start at position 11 - the character '0'. Does the pattern match if I start here? '0' does match '0'. Does 'X' match [xX] (x or X)? Yes. Does 'f' match [0-9a-fA-F]? Yes. Since that is the end of the pattern, I should return position '11' as a match.
etc...

 The human mind is a dangerous plaything. This tiny ad is pretty safe: the value of filler advertising in 2021 https://coderanch.com/t/730886/filler-advertising