• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regular expression

 
Carlos Bonzilla
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a field where the users are allowed to enter comments on my web-page. The characters allowed to enter are . For instance:
1.I am allowed is ok
2.I am not allowed # is not ok
3.ÅÄÖåäö-,.:'éüáèç%()@ is ok.

Any suggestion for the regular expression that will solve this ?
Best regards
/Carlos
 
Darryl Burke
Bartender
Posts: 5148
11
Java Netbeans IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here are a couple of learning resources for regex:
http://www.regular-expressions.info/
http://download.oracle.com/javase/tutorial/essential/regex/index.html

And of course there's the java.util.regex.Pattern API.

Show your best efforts, in the form of an SSCCE and someone will help you do the fine-tuning if needed.
 
Ryan Beckett
Ranch Hand
Posts: 192
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Since I've just given you the answer, at least let me explain it, so you can learn how I did it.

Start off by reviewing the literature in the Regex API linked above. It's a good reference, but if you've never done regular expressions, check out the tutorials first.

(1)

This is the range of the specific Latin unicode characters you expect to be in the input. Simple enough. See the Latin Unicode chart for details.

(2)

This means match (or allow) any word character (0-9, A-Z, or a-z)

(3)

Allow whitespace characters.

(4)

Allow any punctuation character.

(5)

Allow all of previously declared characters "and not" this one. Whatever punctuation you don't want needs to be included inside the brackets.

(6)

This is a greedy quantifier. It says to allow "one or more of all of these characters" in the string. Note that the regular expression must be enclosed in brackets when applying the quantifier.

Also, Note that all of these specifiers are escaped because they're within strings. Hope that helps. Good luck.
 
Carlos Bonzilla
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ryan Beckett wrote:

See Latin Unicode.


Thanks for your help Ryan. I think some more characters needs to be excluded. For instance, the string Hey how are u$[*? passed the test although it shouldn't.

Best regards
/Carlos
 
Ryan Beckett
Ranch Hand
Posts: 192
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try this.

 
Carlos Bonzilla
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ryan Beckett wrote:Try this.



Thanks for your explanation Ryan. I am very new to regular expressions so your links will be read for sure

Best regards
/Carlos
 
Rob Spoor
Sheriff
Pie
Posts: 20669
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd probably use \\p{L} and \\d instead of \u00C0-\u00FF and \\w; \\p{L} includes a-z and A-Z, so \\w can be replaced by \\d. \\p{L} also includes all Unicode letters, including some of the more exotic ones (Spanish, Scandinavian, etc).
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic