• Post Reply Bookmark Topic Watch Topic
  • New Topic

regex question  RSS feed

 
Hendra Kurniawan
Ranch Hand
Posts: 239
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The requirement is :
have (one latin character (a-z or A-Z) or more) AND (one latin digit (0-9) or more). Right now, it's just latins that I need to cover, others like chinese, japanese, well, let that be other's headache.
what I have in mind:


is there any way to simplify this? or is this the best way there is? thanks
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It seems to me like you are checking if the string has at least one [a-zA-Z] character, and at least one digit. For me, it would be clearer to do this:

What you have above seems like a lot of symbology which makes it hard to read - granted this is coming from a non-regex guy. But I would prefer something a little more wordy if it does the same job and is more self-explanatory.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hendra Kurniawan wrote:is there any way to simplify this? or is this the best way there is? thanks

Personally, I'd do it separately, viz:but if you really feel you must do it in a single "matches"-style regex (which is what it looks like), what you have looks reasonable, except that you don't need the '+'s. You can also reduce a bit of redundancy by taking the "fluff" out of the brackets, viz:
Winston
 
Hendra Kurniawan
Ranch Hand
Posts: 239
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
my brain is in the process of digesting regex, so I'm trying these out. I'm trying to find elegant solution to this seemingly simple problem, but ended up with the ugly syntax you guys saw up there. yes, breaking it into two matching processes definitely solves it, but I wish to know if there's a new technique that I can learn. Who knows if there's a one liner solution that's elegant out there. thanks
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hendra Kurniawan wrote:Who knows if there's a one liner solution that's elegant out there. thanks


Elegant in what sense? To me, elegant code is code that is expressive. It's code that I can read without hurting my brain in the process of understanding what that code does. IMO, it's kind of hard to top the elegance of what Winston already gave; I don't really care that it's more than one line of code, I care that it took me less than 3 seconds to understand what it was doing. But maybe elegance, like beauty, is in the eye of the beholder (shrug)
 
Campbell Ritchie
Marshal
Posts: 56584
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have seen some 1980s‑style C code, from the days when you paid so much per megabyte for memory. Then, people tried to cram as much into one line as they could. That is part of the reason why we have a proliferation of arithmetical operators; it takes two keystrokes fewer to write i++; than i=i+1;
People actually thought it was good to write the shortest code possible, and boy could they get it short.

Now we can get several gigabytes’ memory for the price of a good dinner, we can spread our wings and write code people can actually read.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hendra Kurniawan wrote:my brain is in the process of digesting regex, so I'm trying these out. I'm trying to find elegant solution to this seemingly simple problem, but ended up with the ugly syntax you guys saw up there. yes, breaking it into two matching processes definitely solves it, but I wish to know if there's a new technique that I can learn.

Not a new technique, just new classes: ie Pattern (java.util.regex.Pattern) and Matcher (java.util.regex.Matcher).
Regexes aren't all about String.matches() you know. In fact, the chances are that that will be the slowest solution, even with regexes.

If you want the fastest solution (keep your ears closed, Mr. Knuth), don't use regexes at all:is probably quite a bit quicker than a regex, and furthermore, it will work for almost any alpha and digit characters.

Regexes are great, but they're not good for everything; and, like lots of other things in programming, sometimes you have to break them up to keep them manageable.

Winston
 
fred rosenberger
lowercase baba
Bartender
Posts: 12565
49
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't forget the famous quote:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:IMO, it's kind of hard to top the elegance of what Winston already gave;

Thanks mate, that cheered me up. Except then I went and spoiled it all with my last post...

Winston
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
fred rosenberger wrote:Don't forget the famous quote:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

You know? I had. Thanks for reminding me.

Winston
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:You know? I had. Thanks for reminding me...

@Hendra: In looking for the source of Fred's quote, I came across this site, which may interest you. The chap who wrote it seems to be quite a regex geek, and it's fairly light-hearted; there's also some good cautionary stuff in there (check out the "(x+x+)+y" example).

Winston
 
Hendra Kurniawan
Ranch Hand
Posts: 239
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@Junilu:
to me, elegant means short, easy to understand, and most importantly correct. while my solution is correct, but it's certainly is not short nor it's easy to understand. For this case, correct means : fulfill the two requirements above. that's all

So, everybody agrees that no "elegant" regex solution exists for this simple requirement? I'm also not a regex enthusiast. Hell, I just heard about regex several months ago. Let's just say I'm trying to explore new options to perform mundane tasks like validations and stuff. Normally I did exactly like Winston did in his post, use loop to do the checking. simple and easy to understand. as for speed, I believe the difference to be insignificant.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hendra Kurniawan wrote:So, everybody agrees that no "elegant" regex solution exists for this simple requirement?

I think we'd probably agree that no elegant single regex solution exists, but as both Steve and I have shown you, they can be made much more so by splitting the problem up.

Normally I did exactly like Winston did in his post, use loop to do the checking. simple and easy to understand.

And good for only one thing. The whole point about regexes is that they're generic - write once, use many times - and if they're not abused, they're usually fast enough.

as for speed, I believe the difference to be insignificant.

Then you need to read more. The expression I gave in my last post takes 25 seconds to parse 25 characters (or at least it did in 2008) and exhibits O(n^2) behaviour. And while I'm a firm advocate of Knuth's maxim (see also the quote below), you'll soon discover how good or bad a regex can be when you have a million lines of log files to check.

I'd also suspect that
".*([a-zA-Z].*\\d|\\d.*[a-zA-Z]).*"
is significantly faster than
"(.*[a-zA-Z]+.*\\d+.*|.*\\d.*[a-zA-Z]+.*)"
but I can't be bothered to test it. I also agree that neither is "elegant".

Speed is rarely a good reason for sacrificing readability and never for correctness; but in this case you've been offered solutions that do neither, and are almost certainly faster than your original expression.

Winston
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
as for speed, I believe the difference to be insignificant.

Then you need to read more. The expression I gave in my last post takes 25 seconds to parse 25 characters (or at least it did in 2008) and exhibits O(n^2) behaviour. And while I'm a firm advocate of Knuth's maxim (see also the quote below), you'll soon discover how good or bad a regex can be when you have a million lines of log files to check.

I'd also suspect that
".*([a-zA-Z].*\\d|\\d.*[a-zA-Z]).*"
is significantly faster than
"(.*[a-zA-Z]+.*\\d+.*|.*\\d.*[a-zA-Z]+.*)"
but I can't be bothered to test it. I also agree that neither is "elegant".

Speed is rarely a good reason for sacrificing readability and never for correctness; but in this case you've been offered solutions that do neither, and are almost certainly faster than your original expression.



In my opinion, it is not just a matter of an ugly regex.... this....

"(.*[a-zA-Z]+.*\\d+.*|.*\\d.*[a-zA-Z]+.*)"


Quite frankly won't scale. And can break with even a very simple change in the requirements. For example, let's change the requirement to at least one lower case letter, at least one upper case letter, and at least one number. Following from what it looks like you have done to get to this regex, can I assume that you would reach this answer? (based on the fact that you used the alternation operator on the possible orders, and padded it with the possibility of any amount of any character)

"(.*[a-z]+.*[A-Z]+.*\\d+.*|.*\\d+.*[a-z]+.*[A-Z]+.*|.*[A-Z]+.*[a-z]+.*\\d+.*|.*\\d+.*[A-Z]+.*[a-z]+.*|.*[a-z]+.*\\d+.*[A-Z]+.*|.*[A-Z]+.*\\d+.*[a-z]+.*)"



Add a requirement for a punctuation, and you'll get a headache...

Henry
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hendra Kurniawan wrote:
So, everybody agrees that no "elegant" regex solution exists for this simple requirement? I'm also not a regex enthusiast. Hell, I just heard about regex several months ago. Let's just say I'm trying to explore new options to perform mundane tasks like validations and stuff. Normally I did exactly like Winston did in his post, use loop to do the checking. simple and easy to understand. as for speed, I believe the difference to be insignificant.


Of course, there is a "elegant" solution to this requirement..... but elegance is in the eye of the beholder. There is certainly a regex that is stable -- meaning one that doesn't dramatically increase in size due to a simple additional requirement. It is certainly possible to get really comfortable with regular expressions that you can arrive at a regex which looks very simple (and elegant) to you -- and yet, completely confusing to someone new to regexes.

Add enough comments, and you can also convince yourself that anyone should understand the regex that you created... I know that I can convince myself of that, but that doesn't mean that it is elegant. Anyway, here is my attempt... is it understandable?



Henry
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:Anyway, here is my attempt... is it understandable?

Yeah, I figured there was a look-behind solution, but I didn't want to blind Hendra with it.
And BTW, I take back what I said about 'no elegant single regex'; yours is very nice.
I also totally agree with you about the scalability of the original (and my 'trimmed' version).

The only issue for me is that matches() will always be slower than find(), no matter what you do; in fact I wish the String class had a find() method, because matches() is rarely the best option.

Winston
 
Hendra Kurniawan
Ranch Hand
Posts: 239
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
now, henry's solution seems very good as in very short (compared to mine anyway). so you can do that in matches? using + sign to combine multiple regexes. also the ?=, I googled a little and there seems to be look ahead and look behind. With your solution, I assume the order where the character appears before digit and digit appears before char doesn't matter? either case will be handled correctly? thanks
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hendra Kurniawan wrote:now, henry's solution seems very good as in very short...

brevity != good (at least, not necessarily)

although I agree that if you absolutely must have a single regex and use matches(), Henry's is by far the best.

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!