• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Decipher regular expression

 
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Need help deciphering this regular expression used in email address:
^[^\\s]+@[^\\s].*\\.[a-zA-Z]+$

Does this say:
anything (letters and numbers) not whitespace follow by @
then anything not white space follow by a period
then alpha only?
 
Ranch Hand
Posts: 107
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, it does the first portion means ANYTHING but except white-space, so symbols are included there. Something like
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
is a little more explicit and clear to read in my opinion for e-mail.
 
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Bobby Smallman wrote:\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b


That one will not match lower case letters and letters used in some other languages than English. It will not match more than two levels in domain names. E.g. IamTheBoss@jpl.nasa.gov will not be matched. It is probably safe to assume that top levels domains will remain in the A-Z range but it is not obvious that top level domains may not be 5 or more letters in the future. Or even a single letter. The regex in the original post should take care of all of this.

BTW Java regex offers a convenient way to express letters:
\p{L} Any letter (also non-English)
\p{Lu} Upper case letter
Remember an extra \ at the front if used in a literal!
 
Sheriff
Posts: 22783
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

M Wilson wrote:Need help deciphering this regular expression used in email address:
^[^\\s]+@[^\\s].*\\.[a-zA-Z]+$

Does this say:
anything (letters and numbers) not whitespace follow by @
then anything not white space follow by a period
then alpha only?


Almost. There is a . right after the second [^\\s] block. So essentially this regex says:
- 1 or more non whitespace characters
- @
- one non whitespace character
- anything
- a dot
- one or more from a-z and A-Z

The part between the @ and the a-z / A-Z part needs reworking. You can have multiple subdomains; back at University I had an @student.tue.nl email address. So you want to quantify the [^\\s] and dot together as well: ([^\\s]+\\.)*. That * means that rob@localhost is still allowed.

If I need to do any email address format validation I usually just use javax.mail.internet.InternetAddress
 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Prime wrote:The part between the @ and the a-z / A-Z part needs reworking. You can have multiple subdomains; back at University I had an @student.tue.nl email address. So you want to quantify the [^\\s] and dot together as well: ([^\\s]+\\.)*. That * means that rob@localhost is still allowed.


[^\\s] will match a dot so multiple domain levels are matched. Good point about matching single level domains. Whether or not to allow that may be application dependent I guess. May be localhost should be your only single level domain allowed? E.g.
^[^\\s]+@(([^\\s].*\\.[a-zA-Z]+)|(localhost))$

If you do allow single level names in general, you should also consider allowing a wider range of characters than [a-zA-Z] for that name. Would you allow DonJuan@MaƱana1 as an example? (not sure if this looks right on your screen)

Rob Prime wrote:
If I need to do any email address format validation I usually just use javax.mail.internet.InternetAddress


Another good point.
 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
using this ^[^\\s]+@(([^\\s].*\\.[a-zA-Z]+)|(localhost))$

\\s also includes underscore
And i wonder if domain names have underscores in them

so \\s check after @ would cosider _ even if websites do not have underscores in their names...
Correct me if i m wrong about my assumption for domain names wid underscores

^[^\\s]+@(([^\\s&&[^_]].*\\.[a-zA-Z]+)|(localhost))$
 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

g bag wrote:
And i wonder if domain names have underscores in them


Good question. The RFCs that define DNS allows any characters. Applications may be more restrictive.

BTW. The full DNS name may be a maximum of 255 octets with each component (between the dots) can be a maximum of 63 octets. I leave it as an exercise to the reader to enforce theese restrictions in a regex.
 
M Wilson
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you all!
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic