• Post Reply Bookmark Topic Watch Topic
  • New Topic

Decipher regular expression  RSS feed

 
M Wilson
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Need help deciphering this regular expression used in email address:
^[^\\s]+@[^\\s].*\\.[a-zA-Z]+$

Does this say:
anything (letters and numbers) not whitespace follow by @
then anything not white space follow by a period
then alpha only?
 
Bobby Smallman
Ranch Hand
Posts: 107
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, it does the first portion means ANYTHING but except white-space, so symbols are included there. Something like
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
is a little more explicit and clear to read in my opinion for e-mail.
 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bobby Smallman wrote:\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

That one will not match lower case letters and letters used in some other languages than English. It will not match more than two levels in domain names. E.g. IamTheBoss@jpl.nasa.gov will not be matched. It is probably safe to assume that top levels domains will remain in the A-Z range but it is not obvious that top level domains may not be 5 or more letters in the future. Or even a single letter. The regex in the original post should take care of all of this.

BTW Java regex offers a convenient way to express letters:
\p{L} Any letter (also non-English)
\p{Lu} Upper case letter
Remember an extra \ at the front if used in a literal!
 
Rob Spoor
Sheriff
Posts: 21133
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
M Wilson wrote:Need help deciphering this regular expression used in email address:
^[^\\s]+@[^\\s].*\\.[a-zA-Z]+$

Does this say:
anything (letters and numbers) not whitespace follow by @
then anything not white space follow by a period
then alpha only?

Almost. There is a . right after the second [^\\s] block. So essentially this regex says:
- 1 or more non whitespace characters
- @
- one non whitespace character
- anything
- a dot
- one or more from a-z and A-Z

The part between the @ and the a-z / A-Z part needs reworking. You can have multiple subdomains; back at University I had an @student.tue.nl email address. So you want to quantify the [^\\s] and dot together as well: ([^\\s]+\\.)*. That * means that rob@localhost is still allowed.

If I need to do any email address format validation I usually just use javax.mail.internet.InternetAddress
 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:The part between the @ and the a-z / A-Z part needs reworking. You can have multiple subdomains; back at University I had an @student.tue.nl email address. So you want to quantify the [^\\s] and dot together as well: ([^\\s]+\\.)*. That * means that rob@localhost is still allowed.

[^\\s] will match a dot so multiple domain levels are matched. Good point about matching single level domains. Whether or not to allow that may be application dependent I guess. May be localhost should be your only single level domain allowed? E.g.
^[^\\s]+@(([^\\s].*\\.[a-zA-Z]+)|(localhost))$

If you do allow single level names in general, you should also consider allowing a wider range of characters than [a-zA-Z] for that name. Would you allow DonJuan@Mañana1 as an example? (not sure if this looks right on your screen)
Rob Prime wrote:
If I need to do any email address format validation I usually just use javax.mail.internet.InternetAddress

Another good point.
 
Vib Mator
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
using this ^[^\\s]+@(([^\\s].*\\.[a-zA-Z]+)|(localhost))$

\\s also includes underscore
And i wonder if domain names have underscores in them

so \\s check after @ would cosider _ even if websites do not have underscores in their names...
Correct me if i m wrong about my assumption for domain names wid underscores

^[^\\s]+@(([^\\s&&[^_]].*\\.[a-zA-Z]+)|(localhost))$
 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
g bag wrote:
And i wonder if domain names have underscores in them

Good question. The RFCs that define DNS allows any characters. Applications may be more restrictive.

BTW. The full DNS name may be a maximum of 255 octets with each component (between the dots) can be a maximum of 63 octets. I leave it as an exercise to the reader to enforce theese restrictions in a regex.
 
M Wilson
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you all!
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!