Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regex - match group that contains word AND does not contain another word

 
Ioan Damian Sirbu
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greetings,

I have a problem with finding a regex pattern that should match any text containing a group of letters, and in the same time it does not contain another group of letters.
Iterating a file line by line, I need to extract the lines containing the word 'input', AND not containing the word 'type'.
So, 'input damian whatever' is a match, while 'input damian type whatever' is not.

Any ideas?
 
Vivek Singh
Ranch Hand
Posts: 92
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So why Regular expression is required?
As you know the exact text which you need so try this.

 
Ioan Damian Sirbu
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was giving an arbitrary example.
The concrete situation is:
- I need to make a search in Eclipse in all files.
- I need to find the files that contain a custom tag that is like this <input type="calendar"> but is not like <input type="calendar" theme="simple">

I think that using Eclipse's regex matches are an option, and in the same time this regex dilemma is interesting by itself
 
Ioan Damian Sirbu
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So, any ideas?
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34974
379
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Damian Sarbu wrote:So, any ideas?

Yes. Negative lookahead/lookbehind.

It checks for the lack of presence of a regular expression after or before the one you are interested in.
 
Ioan Damian Sirbu
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you, I actually found a good tutorial right here http://www.javaranch.com/journal/2003/04/RegexTutorial.htm
For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))
 
Ireneusz Kordal
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Damian Sarbu wrote:For whoever is interested, the regex should look like this
((.*calendar.*)(?! .*simple.*))

Maybe this pattern works in eclipse, but in java it doesn't work as you expect:

Results:
true
false
true
true
 
Rob Spoor
Sheriff
Pie
Posts: 20669
65
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using .* or similar non-deterministic regex sequences inside lookaheads and lookbehinds usually doesn't do what you want.
 
Ioan Damian Sirbu
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, it was my fault.. I posted wrong.
By matching (.*document.*), I was capturing the whole expression. If the input was "calendar simple", the
lookahead (.*simple.*) would have nothing left to match.
The correct pattern would be (calendar)(?!.*simple.*). This would return true for "calendar" or 'calendar some words", but false for "calendar simple".

I tested this with the RegexTestHarness in the Sun tutorials.



PS: Now that I think I got how this works, I am trying to combine lookahead with lookbehind




 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic