• Post Reply Bookmark Topic Watch Topic
  • New Topic

Regular expressions: Grouping  RSS feed

 
Ranch Hand
Posts: 491
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Input: "XY 9999 A-C 24.12x9 blue,red"

Basically, I want to have 4 groups
XY 9999
A-C
24.12x9
blue,red

I wrote


When tested, I got the output as below (5 groups)

GW 1177 A-C 20.25x7 blue,red ====> This is what I do not want
GW 1177
A-C
20.25x7
blue,red

Not sure if I understand regex "group" concept correctly? What do I miss in terms of the pattern coded?

1M Thanks.
 
Saloon Keeper
Posts: 7993
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you read the documentation for groupCount() and group() carefully?
 
H Paul
Ranch Hand
Posts: 491
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

public String group(int group)

Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().


groupCount
public int groupCount()

Returns the number of capturing groups in this matcher's pattern.
Group zero denotes the entire pattern by convention. It is not included in this count.




If I read the doc correctly, I should start my index from 1 (not from 0 as coded).

Is this correct?
 
Bartender
Posts: 2700
IntelliJ IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Did it work when you tried it?
 
H Paul
Ranch Hand
Posts: 491
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For now, the index issue is yes.

But as a whole, I still have to see how the regular expression group work in general. For now, thanks.
 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
General remarks: using Matcher.match() and a regular expression that starts with ^ and ends with $ makes the code a bit less error prone - currently you may match strings that have spurious information.
 
H Paul
Ranch Hand
Posts: 491
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(a side note: will look into ^ and $)



Input: "XY 9999 A-C 24.12x9 blue,red"

I want the "entire" input string to match the pattern
pattern = firstGroup + space + secondGroup + space + thirdGroup + space + fourthGroup;


And I got what I wanted

GW 1177
A-C
20.25x7
blue,red

Now if I change
thirdGroup = "((\\d+| \\d+\\.\\d++)x(\\d+| \\d+\\.\\d++))"; // 99.9x12 or 9x10 for example

then I got nothing since matcher.matches() return false.

Syntax-wise, what is thirdGroup to be or what is missing? so that I got back the 4 groups.

1M Thanks.
 
H Paul
Ranch Hand
Posts: 491
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1. Syntax-wise, corrected. (no ++ and no space)

thirdGroup = "((\\d+|\\d+\\.\\d+)x(\\d+|\\d+\\.\\d+))";

2. Now I got 6 groups with thirdGroup broken down into 2 extra sub-groups.

GW 1177
A-C
20.25x7 === thirdGroup
20.25 === sub-group
7 === sub-group
blue,red
 
Maarten Bodewes
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Without looking into the reexp, I think you are on your way now. So I will give you some very important hints regarding regular expressions:
0) make sure there isn't already something that parses your input
1) don't make them too complex, you're better create a hierarchy, and mix parsing techniques - e.g first split things with String.split() if possible
2) describe them well, it may take a very long time to read a regexp, describe what you are trying to accomplish
3) create at least a couple of junit tests around them, with (at least) some corner cases, the expected good and possibly some bad scenarios
4) don't use them as technique to e.g. test ranges of numbers, dates etc., there are better tools for that, just test string input
5) remember that groups are *not* repetative (use Matcher.find() instead, or use other ways of repeating things from within the language (this trap caught me a few times)
6) learn to use non-capturing groups (you already got this one) and reluctant qualifiers
7) use findbugs to make sure your rexexps are at least valid at build time (findbugs can also check formatted strings)
8) use plugins for your favourite IDE that enable you to test regexps and their input in real time (extra points if they auto escape backslashes)
9) don't try to learn Pattern.html out of the top of your head, just the general techniques, that's what bookmarks and Google are for

Finally never forget that the Java regexp is brilliantly strong, and works on actual unicode strings - don't get disappointed if other languages don't give you that same robustness or flexibility.
 
H Paul
Ranch Hand
Posts: 491
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(I need time to digest the above/previous advice.)



Question:

I have a input string data as: XYZ 100 green low bowl

I try to capture into 2 groups as

XYZ 100 -- anything except lower case , that is a rule
green low bowl -- anything except upper case , that is a rule

Syntax-wise: Is there something missing? because the above code did not work for the case in question.

String FirstGroup ="([.&&[^a-z]]*)";
String SecondGroup ="([.&&[^A-Z]]*)";
 
Maarten Bodewes
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah, sorry, that was maybe a bit much. Tip 8 though is a pretty useful one, since you can take your regexp one piece at a time. Probably in this case you are expecting the dot to match any character, but because it is in between square brackets, it's just a dot.
 
H Paul
Ranch Hand
Posts: 491
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Above code works.

Thousands of candles can be lit from a single candle, and the life of the candle will not be shortened. Happiness never decreases by being shared."


Thank-you for the candle # 8. Just downloaded Eclipse RegEx Plugin.
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!