• Post Reply Bookmark Topic Watch Topic
  • New Topic

Regex question  RSS feed

 
Frank Zammetti
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all... I've done a little with regular expressions over the years, but I always cringe when I have to

I have what I think is going to be a very simple question to answer, so let me set myself up for feeling stupid...

I need to allow for wildcards in a URL specification. So, I need the following URL:

/app/test/image.gif

...to match the following:

/app/test/*.gif

In other words, I need to be able to specify that all GIFs in the path /app/test should match. I DON'T want to have to specify a regex here though, I want a non-developer to be able to use a * wildcard, so my expectation is that I will be replacing the asterisk in the string with a regex expression. I have the following as a test:

import java.util.regex.*;
public class test {
public static void main(String args[]) {
String s = /app/test/*.gif";
System.out.println("s = " + s);
String ms = "";
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == '*') {
ms += "[*]";
} else {
ms += s.charAt(i);
}
}
System.out.println("ms = " + ms);
Pattern p = Pattern.compile(ms);
Matcher m = p.matcher("/app/test/image.gif");
if (m.matches()) {
System.out.println("found");
} else {
System.out.println("NOT found");
}
}
}

Now, that doesn't actually match, so it would seem the [*] I replace the asterisk with is not appropriate. So, this more a regex question than a Java question, but what is the correct regex to insert in place of the asterisk?

TIA!
 
Frank Zammetti
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, changing...

[*]

...to...

.*

...in the code I posted seems to work. Can anyone verify that is the right answer? I'm a little worried about real periods... are they somehow escaped? Doesn't seem to be required, but I'm not sure.
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
With file matching, the "*" means zero or more of any character.

With Regex, the "*" means zero or more of the previous character. Also, the "." means any character. So... ".*" means zero or more of any character, which is what you wanted.

Also, as noticed, a "." has special meaning to a regex. So if you really want to match a period, you will need to escape the character in the regex.

Henry
 
Frank Zammetti
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry! One last question... how is the period escaped? In my code as posted, \. would break it because Java doesn't recognize it... backslash AFAIK has the same basic mean in a regex expression, so how would one do it? Would you have \\. in the string fed to Pattern.compile?
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Frank Zammetti:
Thanks Henry! One last question... how is the period escaped? In my code as posted, \. would break it because Java doesn't recognize it... backslash AFAIK has the same basic mean in a regex expression, so how would one do it? Would you have \\. in the string fed to Pattern.compile?


Yes... the correct way to escape a "." is "\.". Unfortunately, you also need to get it into a java string, which has special meaning for "\"... so... you need to escape it to "\\.".

And BTW, do a couple more regexs, preferrable complex ones, and you'll learn to stop cringing.

Henry
 
Frank Zammetti
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry, that makes sense.

I don't know about stopping cringing... I can't see that

Ironically, I have a couple of things to do today that will involve some fairly heavy regex lifting, so whether I like it or not, I may be implementing your suggestion
 
Max Habibi
town drunk
( and author)
Sheriff
Posts: 4118
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Frank,

Sorry I missed your question. But yes, Henry is right: do some more regex and you'll stop cringing: I promise

M
 
M Beck
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
actually, it's a good thing that you're cringing; it'll teach you not to overuse regular expressions. they're handy, and sometimes necessary, but there are often much easier and better ways to parse strings. whenever possible, try splitting strings some other way; by scanning for instances of unique separators with indexOf() and splitting with substring(), using StringTokenizer, or using pretty much whatever doesn't make you write a regex.

the reason for this is that regexes, powerful as they are, tend to be a write-only language. i avoid them for the same reason i avoid Perl; i know i can get it to work, but i'm quite unsure if i will ever be able to modify it again afterwards, or even understand how it works once i've forgotten. sometimes they're the best tool, and sometimes they're unavoidable — but they're also, in my humble opinion, very overused.

especially in Java. the need to double-escape those annoying backslashes makes Java's syntax for regexes not only quite unique (so that people used to regexes from elsewhere will have trouble reading them), but also uniquely annoying to read. every time i see a thread on JavaRanch where the "solution" suggested is a quintuple backslash ("\\\\\" — and i've seen that more than once!) i think to myself, these people should be programming in brainf*ck. making sense of all those contiguous backslashes is about as hard as trying to read Lisp while using the parenthesis-count as your guide to program structure, by which i mean it's just about impossible.

parsing strings "by hand" is more verbose and can be more of a pain. but at least you'll be able to read your own code six months down the line and understand how it does whatever it does.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!