Let me get this straight. You have a String list(groups("A")),list(groups("B")),list(groups("C")). You want to retrieve A, B and C. Right? Sounds like something a regular expression could easily do, with a capturing group.
which api to use and some best practice to do that?
Thanks a lot!
Renato Bobbio Calogero
posted 7 years ago
I tried with the following :
but it returns just the string " ... do you have any suggestion on which regex I should use ? The strings between the characters " and " are variable, and the have not a
standard patter. I just need to get rid of
stuff , getting only the contents and not the keywords, and I wish not to implement any buisness logic to recognize if the value which populate the list are useless stuff or the contents I need... do you have any hint ? Thanks ...
Let's take your original regular expression. I just now see one flaw: "\"*\"" means zero or more occurrences of " followed by a single ". That's because in regular expressions * is a meta character that applies to the previous entity, in this case the ". It doesn't work like command line wild cards where * means "any character any number of times". A quick (and incorrect) fix: "\".*\"". That dot makes the * bind to that, so the regular expression becomes a single " followed by any character any number of times followed by a single ". That looks more like it. So we test it:
Not quite what we want, and the reason is simple: .* is greedy. It takes everything between the first " and the last ". What we want is to take everything between each " and the next ".
There are two ways:
1) make the matching not greedy but reluctant. We do this by appending a simple ? behind the *; see also the Javadoc I pointed you to. The regex becomes "\".*?\".
2) do not capture everything but only everything that's not a ". We can use a negating character class for that: [^"]. The regex becomes "\"[^\"]*\"".
Both now result in this output:
Now all you need to do is use substring to get the values. Another option is use a matching group, by using ( and ). You will then get a group 1 inside the matcher. Group 0 (the entire match, which is what group() returns; group() and group(0) are equivalent) is no longer relevant:
Note that it's important to use a loop for find(), and not a single if. That's because you can have multiple matches, and with if you only check the first one. The loop will let you check them all.