• Post Reply Bookmark Topic Watch Topic
  • New Topic

Dumb regexp question  RSS feed

 
Tony Smith
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using the java.util.regex.Matcher and java.util.regex.Pattern packages.

I am trying to parse an input string into tags and text (yes, I'll be looking at velocity soonish) and am having a bad brain day.

Text is what you think it is. Tags look like "${name}".

Here's my regex so far:

Pattern.compile("(\\$\\{([_a-zA-Z0-9]+)\\}){0,1}([^$]*)", Pattern.DOTALL)

This is supposed to say, "recognize zero or one tags, followed by a lot of text that doesn't include dollar signs."

Then I loop over the input string building a List of tag and text classes. Now I have a compiled template. It works most of the time :-/

The problems with that regexp are

1) It seems to eat '$' when they aren't associated with tags. For example, the string "The car cost $10000" would be recognized as two separate pieces of text with the '$' being eaten.

2) It seems 'nothing' matches this string, so the final result is always a null result - that is, a match is returned, but all the groups are null or zero-length strings.

I've gone around-and-around on this, when I fix one issue I make another. Any suggestions?
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It isn't eating the dollar signs, it just isn't matching them. It seems you want a dollar sign to be treated like any other character if it's not the start of a tag, but there's no provision for that in your regex. Here's one way you can express it:
Each time this regex is applied, it will match either a sequence of normal text or a tag, normal text being defined as one or more of any character other than '$', or '$' if it's not followed by '{'. This will still silently skip over any "${" sequence that isn't part of a well-formed tag, but you can add another alternative to deal with that:If you get a match, but both group(1) and group(2) are null, pitch a fit.

Notice that these regexes are always required to match at least one character. That isn't true of your regex; the first part is controlled by "{0,1}" (which is the same as "?" BTW), while the second part is controlled by "*". Net result: zero characters required. When you use the find() or lookingAt() methods with a regex that can match zero characters, you'll always get a match--whether you want it or not.
 
Tony Smith
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank for your kind reply. The real problem has little to do with matching '$'s - that was really desperation on my part. Now, the real problem is to recognize text and tags... Pulling from the posting above, another regexp is:



Which says, "any amount of text followed by a tag" OR "at least one character". The latter part keeps the regexp from matching the depleting input.

Thanks much, you got me out of my rut.

[ September 30, 2005: Message edited by: Tony Smith ]
[ September 30, 2005: Message edited by: Tony Smith ]
 
Akshay Kiran
Ranch Hand
Posts: 220
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
look, if all you want it to extract the tags,

String patt= "\\$+";
and now
String[] tags= input.split(patt);
and voila, you've got all your tags in the tags array, use it to further simplify things suited to your purpose.

If you give me the proper specification, I might be able to help you better. have a nice day!
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Looks good, Tony.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!