• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • paul wheaton
  • Jeanne Boyarsky
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
  • Himai Minh
Bartenders:

Problem with Regex

 
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I was undert the impression that \\ would escape characters in a character class, but it doesnt seem to work for me.

java.util.regex.PatternSyntaxException: Illegal octal escape sequence near index 4
[^\0-9A-Z\p{Blank}
^_`~\p{Punct}]

I need to allow the following characters
\
0-9
A-Z
\p{Blank}
\r
\n
^_`~
\p{Punct}
[]{}|

How would I format the regex?
I tried:
[^\\0-9A-Z\\p{Blank}\r\n^_`~\\p{Punct}\\[\\]\\{\\}\\|]
 
Bartender
Posts: 1845
10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
To escape backslash \ in a regex string you need FOUR backslashes.
You need two backslashes in the regex.
You need to escape each backslash in the string = total of four.

Also you need to escape the backslashes with the \r and \n
And do you mean to start with a ^? Because that indicates a logical not - ie all characters EXCEPT what you include in the square brackets.

String escapePattern = "[^\\\\0-9A-Z\\p{Blank}\\r\\n^_`~\\p{Punct}\\[\\]\\{\\}\\|]"

Hope this helps some,
evnafets
[ July 12, 2005: Message edited by: Stefan Evans ]
 
Stefan Evans
Bartender
Posts: 1845
10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Oh, and \p{punct} includes all of these characters: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

Thus you don't need to include them in your regex explicitly.

Check out java.regex.Pattern for full pattern syntax.

Actually there are quite a few patterns you could use.
\p{Upper}\p{Digit}\p{Blank}\p{Punct}\r\n should do it.

What is it you are trying to filter out? Just lower case letters?
In that case [^\p{Lower}] would do it.
 
Taco Fleur
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi thanks,

at least it doesnt error this time. but it doesnt work as it should either ;-)

Pattern myValidation = Pattern.compile( "[^" + VALID_MESSAGE_PATTERN + "]", Pattern.DOTALL );
myMatch = myValidation.matcher( messagePart );
isValid = myMatch.matches();
if ( !isValid )
{
setError( "Invalid message format, message: " + messagePart + " Invalid characters are: " + messagePart.replaceAll( "[" + VALID_MESSAGE_PATTERN + "]", "" ) );
}

This is the string I run it on: ZHDASCTXID0400\nZTX777Y20050711777571\nZTRENDTXID3\n

The output is:
Invalid message format, message: ZHDASCTXID0400
ZTX777Y20050711777571
ZTRENDTXID3
Invalid characters are:

which I do not understand, because it says there are invalid characters, but when I remove all valid characters from the string nothing is left.
 
Stefan Evans
Bartender
Posts: 1845
10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The pattern given will match ONE character only.
To make it match One or more characters, use a + on the end of it
ie [A-Z]+ will match 1 or more characters from A-Z

Rather than trying to get the regex in one foul swoop, I would recommend starting small, and building it up a bit at a time.
ie [A-Z]+ and then [A-Z0-9]+ and then add punctuation...

Also be aware that [^abc] will match any character EXCEPT abc which seems a bit different from what you wanted. I don't know why you are putting it around your pattern as a whole when constructing it.

Good luck,
evnafets
[ July 12, 2005: Message edited by: Stefan Evans ]
 
Taco Fleur
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

first I want to make sure there are no other characters than the allowed characters, i.e. the ones in the regex, then if there are characters in the message outside the allowed range then I want to display the invalid characters, removing all valid characters shouls leave all the invalid characters, right?

The removing works like a charm it removes all the valid chars, but the check to see if there are any characters outside the range is giving me troubles.
 
Taco Fleur
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I added the + as you susgested, I come from another language where you can specify "ALL" i.e. remove all chars in the set ;-)

Pattern myValidation = Pattern.compile( "[^" + VALID_MESSAGE_PATTERN + "]+", Pattern.DOTALL );
myMatch = myValidation.matcher( messagePart );
isValid = myMatch.matches();
if ( !isValid )
{

However it still complains saying the message is invalid.
 
Taco Fleur
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Even if I only do
Pattern myValidation = Pattern.compile( "[A-Z]+", Pattern.DOTALL );
myMatch = myValidation.matcher( messagePart );
isValid = myMatch.matches();
it still tells me the match is false.
Maybe I am just missing something vital here.?
The string has characters between A-Z, so it should match right?
I am just trying to take it one step at a time as you suggested.
 
Stefan Evans
Bartender
Posts: 1845
10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is a subtle difference between the two methods Match and RemoveAll.

Matches returns true if the String provided can be met by the pattern exactly. Currently I think it is failing because you didn't have the + sign - so the pattern would fail on anything longer than one character

They are both working as expected.
See if this example code explains it.
Note the difference between pattern and pattern2 is the + sign.

[ July 12, 2005: Message edited by: Stefan Evans ]
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic