• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Counting exact matches of substring.

 
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a string which contains words, numbers, line breaks, punctuations etc. all sort of characters.
I want to count the number of exact occurrences of some words in the string.

I am experimenting using the following code


I am trying to work out how the regular expression should look when I want exact matches, eg. given the text "foobar" and substring "foo" the count should be 0.
The regular expression

almost works for counting occurences of "foo", but not quite.
 
Ranch Hand
Posts: 72
IntelliJ IDE Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First of all, there is a nice overview of regular expressions in Java API for Pattern class (here). I even use it for reference, when working with regexps in other languages.
Also you may want to have a look at pretty good tutorial on regular expressions from Sun here.
So, before asking such questions, you could try to figure it out,by first, learning the basics about regular expressions.
Anyway, the correct pattern in your case would be:
^foo$
As you can find in documentation for Pattern class, ^ stands for the beginning of a line, and $ for the end.
 
Ranch Hand
Posts: 103
Netbeans IDE Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Michael,

As Anton suggested you probably need to look into your regex expression. As your expression stands i beleive it would match for 1foo98 which is not what you want i guess.

On a personal note, if all you want is just to count the number of occurences you might just as well use the Scanner class.
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anton Shaykin wrote:First of all, there is a nice overview of regular expressions in Java API for Pattern class (here). I even use it for reference, when working with regexps in other languages.
Also you may want to have a look at pretty good tutorial on regular expressions from Sun here.
So, before asking such questions, you could try to figure it out,by first, learning the basics about regular expressions.
Anyway, the correct pattern in your case would be:
^foo$
As you can find in documentation for Pattern class, ^ stands for the beginning of a line, and $ for the end.



I am familiar with the basics of regular expressions.
Have a look at the question again and see that ^foo$ is not the correct pattern in my case as I want to count "foo" for every time it appears as a word in a string.
The string might for instance be "baz23! foos23foo bar foobar barfoo!foo" and the count should be 2.
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

jishnu dasgupta wrote:Hi Michael,

As Anton suggested you probably need to look into your regex expression. As your expression stands i beleive it would match for 1foo98 which is not what you want i guess.

On a personal note, if all you want is just to count the number of occurences you might just as well use the Scanner class.



I would want to count that as an occurence. Seems like \bfoo\b should work [EDIT: Absolutely not]
 
jishnu dasgupta
Ranch Hand
Posts: 103
Netbeans IDE Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Michael Boehm wrote:
The string might for instance be "baz23! foos23foo bar foobar barfoo!foo" and the count should be 2.



Michael isnt the word "foo" actaully appearing 5 times in this String??
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

jishnu dasgupta wrote:
Michael isnt the word "foo" actaully appearing 5 times in this String??



Not the way I want to count it. I only want to count exact matches, so for me "foo" only appear twice since it isn't counted in eg. "foos" and "foobar"
 
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So what you want is foo, preceded by nothing, whitespace or punctuation, and followed by nothing, whitespace or punctuation. That looks like a job for positive lookahead / lookbehind:
(?<=^|\s|\p{Punct})foo(?=$|\s|\p{Punct})

That will only result in one match:
- foos23foo does not match since this is one word containing foo, not the word foo itself
- foobar does not match since this is one word containing foo, not the word foo itself
- barfoo does not match since this is one word containing foo, not the word foo itself
- foo matches since it's preceded by only a punctuation character
 
Anton Shaikin
Ranch Hand
Posts: 72
IntelliJ IDE Oracle Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

That looks like a job for positive lookahead / lookbehind


Exactly, and that goes far beyond the "Beginning Java". Regular expressions are all about formalizing your requirements, so first you have to define what you mean by "word". Because, according to the common regexp vocabulary, a word character could be described by the following pattern [a-zA-Z_0-9]. As I see in your case, you mean something different.
 
Marshal
Posts: 80656
477
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anton Shaykin wrote: . . . that goes far beyond the "Beginning Java". . . ..

Agree. Moving thread.
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I managed to do what I wanted. I used an appropriate Pattern and then I count by using split on the string containing the text. However this is quite slow.
 
Ranch Hand
Posts: 441
Scala IntelliJ IDE Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This works for what you described in your example, although it may be what you have already:
reply
    Bookmark Topic Watch Topic
  • New Topic