Forums Register Login

Dumb regexp question

+Pie Number of slices to send: Send
I am using the java.util.regex.Matcher and java.util.regex.Pattern packages.

I am trying to parse an input string into tags and text (yes, I'll be looking at velocity soonish) and am having a bad brain day.

Text is what you think it is. Tags look like "${name}".

Here's my regex so far:

Pattern.compile("(\\$\\{([_a-zA-Z0-9]+)\\}){0,1}([^$]*)", Pattern.DOTALL)

This is supposed to say, "recognize zero or one tags, followed by a lot of text that doesn't include dollar signs."

Then I loop over the input string building a List of tag and text classes. Now I have a compiled template. It works most of the time :-/

The problems with that regexp are

1) It seems to eat '$' when they aren't associated with tags. For example, the string "The car cost $10000" would be recognized as two separate pieces of text with the '$' being eaten.

2) It seems 'nothing' matches this string, so the final result is always a null result - that is, a match is returned, but all the groups are null or zero-length strings.

I've gone around-and-around on this, when I fix one issue I make another. Any suggestions?
+Pie Number of slices to send: Send
It isn't eating the dollar signs, it just isn't matching them. It seems you want a dollar sign to be treated like any other character if it's not the start of a tag, but there's no provision for that in your regex. Here's one way you can express it:
Each time this regex is applied, it will match either a sequence of normal text or a tag, normal text being defined as one or more of any character other than '$', or '$' if it's not followed by '{'. This will still silently skip over any "${" sequence that isn't part of a well-formed tag, but you can add another alternative to deal with that:If you get a match, but both group(1) and group(2) are null, pitch a fit.

Notice that these regexes are always required to match at least one character. That isn't true of your regex; the first part is controlled by "{0,1}" (which is the same as "?" BTW), while the second part is controlled by "*". Net result: zero characters required. When you use the find() or lookingAt() methods with a regex that can match zero characters, you'll always get a match--whether you want it or not.
+Pie Number of slices to send: Send
Thank for your kind reply. The real problem has little to do with matching '$'s - that was really desperation on my part. Now, the real problem is to recognize text and tags... Pulling from the posting above, another regexp is:



Which says, "any amount of text followed by a tag" OR "at least one character". The latter part keeps the regexp from matching the depleting input.

Thanks much, you got me out of my rut.

[ September 30, 2005: Message edited by: Tony Smith ]
[ September 30, 2005: Message edited by: Tony Smith ]
+Pie Number of slices to send: Send
look, if all you want it to extract the tags,

String patt= "\\$+";
and now
String[] tags= input.split(patt);
and voila, you've got all your tags in the tags array, use it to further simplify things suited to your purpose.

If you give me the proper specification, I might be able to help you better. have a nice day!
+Pie Number of slices to send: Send
Looks good, Tony.
New rule: no elephants at the chess tournament. Tiny ads are still okay.
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 634 times.
Similar Threads
Validating user input: checking for illegal characters
Key Filtering problem
Pattern Error while using RegEx
How to Use StringUtil class to find the number of times a character/string/int occured in a String?
RegEx, CharBuffer Vs String performance
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 02:34:10.