Forums Register Login

Regex to parse arguments

+Pie Number of slices to send: Send
I'm working on parsing a string from an RFC, and I can't get my regex to work. So I've written a small Java program to test. I don't understand the results, so I can't figure out what I'm doing wrong.

The applicable section deals with a "type=" string.

The regex that I'm using is:

The specs are that there can be either a series of type=X separated by semicolons,
type=X;type=Y;type=Z
or you can have a series of arguments,
type=X,Y,Z
where the X values are keywords

It seems to work fine for the "type=X;type=Y" model
The output doesn't do a proper greedy match with the series of keywords separated by commas. such as



Thanks
pat
+Pie Number of slices to send: Send

Unfortunately, I think you are confusing how regex groups work. Group 1 is always the first parenthesis. Group 2 is always the second parenthesis. etc.

For example, let say you patterns is .... "(hello)*" .... You can match a long string of 100 hello strings. But in terms of the number of groups, it will only be one group -- for the one parenthesis. And it's value will be assigned to the last match of the subgroup.

Henry
+Pie Number of slices to send: Send
 

type=CELL,pref,msg:(703) 555-8914
gc: 1 = CELL
gc: 2 = ,msg
gc: 3 = msg
gc: 4 = null
gc: 5 = null
gc: 6 = null
gc: 7 = null



So, the first match is CELL, which is the first paren. The second is ",msg" which is the latest match using the second paren (the eariler match of ",pref" is lost). The third match is "msg" which is the latest match using the third paren (the eariler match of "perf" is lost). And all the rest is null because there were no successful sub-matches with parens 4 thru 7.

Henry
+Pie Number of slices to send: Send
[quote=Henry Wong]Unfortunately, I think you are confusing how regex groups work. Group 1 is always the first parenthesis. Group 2 is always the second parenthesis. etc. [/quote]

Wouldn't be the first time. My understanding is from my 40 year old study of BNF and formal languages, I've not done much with serious pattern matching using regex in any languages.

[quote=Henry Wong]For example, let say you patterns is .... "(hello)*" .... You can match a long string of 100 hello strings. But in terms of the number of groups, it will only be one group -- for the one parenthesis. And it's value will be assigned to the last match of the subgroup.[/quote]

Do you not get any indication that you matched "hello" vs "hellohellohello"? Both meet the rule.

Do extra parens help?

So if the term is (foo|baz)* does my understanding that foobazbazfoo is not matched?
i.e. foo or baz, repeated as many times as you want?

+Pie Number of slices to send: Send
 

So if the term is (foo|baz)* does my understanding that foobazbazfoo is not matched?
i.e. foo or baz, repeated as many times as you want?



In this case, it does match, but the result is probably not what you are expecting.

Group zero (which haven't been discussed yet), is the true match of the regex, and will match "foobazbazfoo". Group 1 is actually the first subgroup (that the first paren matches). This matches 4 times during this match, and will be assigned to the last submatch, which is "foo".

Do you not get any indication that you matched "hello" vs "hellohellohello"? Both meet the rule.



Well, group zero is different. But you probably mean how would you deal with each "hello". In general, the regex is changed so that find() will return the smaller portion -- probably just a "hello" with a lookbehind or lookahead, to make sure that it is attached to the previous hello, etc. (EDIT: it's probably easier to extract the chain of hellos first, and then use regex again on the chain)

Henry
+Pie Number of slices to send: Send
 

Henry Wong wrote:Group zero (which haven't been discussed yet), is the true match of the regex


Thanks Henry.
I've been playing arround with it, and there seems to be no way to get the unique values of the early parts matchied by the
(foo|baz)*
Getting the last one is easy.

Looks like I'll need to use one regex to identify the substring that matches the final BNF, and then use another to parse/split it into pieces.

Where is snobol when we need it?
If you were a tree, what sort of tree would you be? This tiny ad is a poop beast.
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 1290 times.
Similar Threads
Merging rows with same ID together with dynamic headers with CSV
Help correcting a button behavior in swing
Java Recursion method
Fake a post request from java class
Reference variable in Inner Class
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 09:32:50.