Jaikiran Pai wrote:My skills at regex aren't too great but I have a relatively simple regex which is expected to match email address patterns from a string. The code is pretty straightforward and works fine except when it gets passed a very specific input.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Winston Gutkowski wrote:
What input?
Winston Gutkowski wrote:
My advice: Write out, in English, what you expect that regex to do.
Winston Gutkowski wrote:
There are plenty of regex dweebs here who will tell you if you're wrong ;)
Jaikiran Pai wrote:1) A word (ex: abcd)
2) possibly followed by certain special characters (ex: the underscore character, the dot character)
3) possibly followed by another word again
4) Followed by the @ character
5) Followed by another word (ex: gmail)
6) Followed by the dot character
7) Followed by another word (ex: com)
8) Possibly followed by a combination of dot character and a word (ex: .co.uk)
Putting that all together it would be abcd_xyz@gmail.co.uk (just a random example).
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
To earn money on java go to upwork.com
Ulf Dittmer wrote:My guess is that it will terminate eventually, given enough time and memory. Have you tried it with a much shorter input string only composed of underscore characters? This could be a situation where the time required grows exponentially with the length of the input string;
Winston Gutkowski wrote:
Jaikiran Pai wrote:1) A word (ex: abcd)
OK, well the boundary is '\\w', NOT '\\w+'.
Winston Gutkowski wrote:
You also need to define what a "word" is.
Winston Gutkowski wrote:
2) possibly followed by certain special characters (ex: the underscore character, the dot character)
Too vague. What characters? Precisely. E-mail addresses have a specification, so you need to get those from there.
Winston Gutkowski wrote:
3) possibly followed by another word again
Again: too vague. Do you specifically mean the characters detailed in Steps 1 and 2?
Winston Gutkowski wrote:
5) Followed by another word (ex: gmail)
See above.
Winston Gutkowski wrote:
7) Followed by another word (ex: com)
Are you sure?
Volodymyr Levytskyi wrote:This is happening because of backtracking of greedy quantifiers in regex.
To avoid backtracking I know two choices : atomic group or possessive quantifiers. Both mean match as many symbols as possible with greedy quantifier and never release(backtrack) symbols.
So I just used atomic group for first greedy plus and problem has gone.
So, regex becomes:
(?>\\w+) means, that it matches as many one of [a-zA-Z0-9_] as possible and never release matched symbol.
Volodymyr Levytskyi wrote:
This article will be useful to read.
[OCP 17 book] | [OCP 11 book] | [OCA 8 book] [OCP 8 book] [Practice tests book] [Blog] [JavaRanch FAQ] [How To Ask Questions] [Book Promos]
Other Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, TOGAF part 1 and part 2
Wanna install linux on a vacuum cleaner. Could anyone tell me which distro sucks better?
willCodeForFood("Java,PHP,C#,XML,VBS,XHTML,CSS,JavaScript,SQL"); //always looking for job opportunities in AU/NZ/US/CA/Europe :P
Ulf Dittmer wrote:As I understand it, the problem is not validating emails, it's finding emails in a larger piece of test
My previous laptop never exploded like that. Read this tiny ad while I sweep up the shards.
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
|