• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Doubt regarding Pattern

 
Mansukhdeep Thind
Ranch Hand
Posts: 1158
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

Have a look at the following code:



The pattern printed is :

\p{javaWhitespace}+

How is this happening? What does the line tell the JVM to do with the scanned input?
 
Rommel Sharma
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's printing the delimiter in use. In this case it is:

\p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace()

where

\p{javaWhitespace} is the regular-expression constructs used to accommodate all valid whitespaces.

Looking at the java documentation would lead you to the following:

A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\u0009', HORIZONTAL TABULATION.
It is '\u000A', LINE FEED.
It is '\u000B', VERTICAL TABULATION.
It is '\u000C', FORM FEED.
It is '\u000D', CARRIAGE RETURN.
It is '\u001C', FILE SEPARATOR.
It is '\u001D', GROUP SEPARATOR.
It is '\u001E', RECORD SEPARATOR.
It is '\u001F', UNIT SEPARATOR.

Thanks,
Rommel.

 
Matthew Brown
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mansukhdeep Thind wrote:What does the line tell the JVM to do with the scanned input?

It doesn't tell it to do anything. It's just getting the Pattern that is currently being used by the Scanner to process any input it receives (which you then print out). Since your code doesn't set the pattern anywhere, this must be the default pattern used by Scanner. Which is what the documentation says (java.util.Scanner):
A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.
 
Mansukhdeep Thind
Ranch Hand
Posts: 1158
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I understood that the default delimiter is a White space character. Then what is "\p" for. And why does it print "+" which is a greedy quantifier searching for 1 or more white spaces. It should simply use{javaWhitespace}. Where does the rest of the regex come from(the \p and +)?
 
Winston Gutkowski
Bartender
Pie
Posts: 10508
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mansukhdeep Thind wrote:Where does the rest of the regex come from(the \p and +)?

Have a look at the docs for java.util.regex.Pattern. It explains all this stuff.

Winston
 
Mansukhdeep Thind
Ranch Hand
Posts: 1158
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That is too much of information to digest in one go Winston. It would be like searching for a needle in a hay stack. Could you be more specific as to under which heading I should read?
 
Stephan van Hulst
Bartender
Pie
Posts: 6127
74
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
\p{javaWhitespace} simply means "one whitespace character". \p is part of the character class. Without it, {...} would be interpreted as a quantifier, and since "javaWhitespace" is not a number, it would likely throw an exception.

The + is needed because you want entire lengths of whitespace to be seen as one single delimiter. If you don't use the quantifier, it will also return empty string tokens between two spaces.

Note that it's worth it to read and understand the entire Pattern javadoc.
 
Matthew Brown
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look for "character classes". Basically, {javaWhitespace} isn't a valid regular expression. \p{javaWhitespace} is. And it uses + because by default it treats multiple spaces as if they were a single space.
 
Mansukhdeep Thind
Ranch Hand
Posts: 1158
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:\p{javaWhitespace} simply means "one whitespace character". \p is part of the character class. Without it, {...} would be interpreted as a quantifier, and since "javaWhitespace" is not a number, it would likely throw an exception.

The + is needed because you want entire lengths of whitespace to be seen as one single delimiter. If you don't use the quantifier, it will also return empty string tokens between two spaces.

Note that it's worth it to read and understand the entire Pattern javadoc.


Point noted Stephen. Will devote time to read through the Pattern documentation and try things.
 
Winston Gutkowski
Bartender
Pie
Posts: 10508
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mansukhdeep Thind wrote:That is too much of information to digest in one go Winston. It would be like searching for a needle in a hay stack. Could you be more specific as to under which heading I should read?

Personally, I just use Ctrl+F.

The fact is that at some point you will have to digest all that information, so why not now, when you actually need it?

Winston
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic