Win a copy of Svelte and Sapper in Action this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Bear Bibeault
  • Junilu Lacar
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • salvin francis
  • Frits Walraven
Bartenders:
  • Scott Selikoff
  • Piet Souris
  • Carey Brown

String Pattern matching

 
Greenhorn
Posts: 22
Spring AngularJS Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Output:
OCPJP
2013

for the match 'OCPJP' the pattern matched, I understood it. But how come it matched for '2013', since \\b is used for matching 'OCPJP' already.
Please help me to clear it up.
 
Sheriff
Posts: 9674
42
Android Google Web Toolkit Hibernate IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Shahana, I think you are thinking of regex incorrectly. The entire pattern is matched multiple times. So the first time "\\b\\w+\\D\\b" matches "OCPJP ", then next time it matches "2013 ". The next time it didn't match because OCPJP7 had a number in the end which didn't match \D and there was no space at the end. "2013 " matched because \D matched the space after 2013...
 
shahana kareem
Greenhorn
Posts: 22
Spring AngularJS Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually I believe a character in string once matched for a pattern wont be reused again. So here \\b\\w+\\D\\b matches "OCPJP " . So word boundary between space and "2" in 2013 has already been matched for  "OCPJP ". so how it is reused for "2013 ".
 
Ankit Garg
Sheriff
Posts: 9674
42
Android Google Web Toolkit Hibernate IntelliJ IDE Spring Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's a good point. \b matches boundary at the start of the String also. So my guess is even if the space before 2 is matched in the previous match (with \D), \b can consider 2 to be start of a String and match it as a word boundary...
 
author
Posts: 23883
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

shahana kareem wrote:Actually I believe a character in string once matched for a pattern wont be reused again. So here \\b\\w+\\D\\b matches "OCPJP " . So word boundary between space and "2" in 2013 has already been matched for  "OCPJP ". so how it is reused for "2013 ".



Two points. First. The \\b matches either transition -- the transition from a word character to a non-word character *or* the transition from a non-word character to a word character. It also matches the start and end of input as that is also a transition. So, for the first match ("OCPJP"), the second \\b is for the transition from the P to the space.

Second, word boundaries are zero length. It is only the transition of characters, and does not need to "capture" the character involved. Although, in this case, the second "P" is returned because it is part of the \\D match... or basically, word boundaries don't have "reuse" criteria to worry about, as there are no characters to "capture".

Henry
 
When you have exhausted all possibilities, remember this: you haven't - Edison. Tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    Bookmark Topic Watch Topic
  • New Topic