• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

Regular Expression Pattern Help Needed

 
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have been struggling to achive following by using regex in Java:

For example. if I have following 5 sentences:

1. A123456 some 2008 junk in my PC
2. A234567 another pile of 2008 junk
3. A345678 collected 2009 rocks
4. A456 got that in 2009.
5. A567890 sent me 3 letters.


What I want to achieve is: pick out the sentences that each should:
a). begins with A and followed by 6 digits
b). does NOT have the word 2008 in it.

This should give me sentence 3) and 5).

How do I construct a regex patten?

Greatly appreciate it

Thank you

-RickM

 
Greenhorn
Posts: 21
Eclipse IDE Opera Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I believe this is what you want: "A\\d{6}((?!2008).)*$" I ran it on your examples and it worked, but I would try it with a few more because I'm not 100% sure about it, my regex skills are a bit rusty.

The A catches your first A, \d{6} stands for "any 6 digits" but you need the extra \ because java will recognize the first \ as an escape character. ((?!2008).)*$ means "any number of characters that don't contain 2008". Hope this works for you =)
 
author
Posts: 23956
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Daniel Croft wrote:I believe this is what you want: "A\\d{6}((?!2008).)*$" I ran it on your examples and it worked, but I would try it with a few more because I'm not 100% sure about it, my regex skills are a bit rusty.

The A catches your first A, \d{6} stands for "any 6 digits" but you need the extra \ because java will recognize the first \ as an escape character. ((?!2008).)*$ means "any number of characters that don't contain 2008". Hope this works for you =)



Have you tested what happens when A is followed by more than 6 digits?

Furthermore, while it does work, I don't like the way the negative look ahead works. It tests it with every character.

Henry
 
Daniel Croft
Greenhorn
Posts: 21
Eclipse IDE Opera Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Good catch on the more than 6 digits part. You're right, the .* after it allows this thing to accept more than 6 digits, although his specs don't technically disallow this. As for the negative lookahead, I sort of chopped that out of a Google search, but it was the best I could find on short notice; I suspect there's probably a better way.
 
Henry Wong
author
Posts: 23956
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ricky Murphy wrote:I have been struggling to achive following by using regex in Java:

For example. if I have following 5 sentences:

1. A123456 some 2008 junk in my PC
2. A234567 another pile of 2008 junk
3. A345678 collected 2009 rocks
4. A456 got that in 2009.
5. A567890 sent me 3 letters.


What I want to achieve is: pick out the sentences that each should:
a). begins with A and followed by 6 digits
b). does NOT have the word 2008 in it.

This should give me sentence 3) and 5).

How do I construct a regex patten?




If you do *not* know (or just a beginner) with regexes. I recommend that this be done with two regexes. One that looks for the first case, and the other for the second. The first regex should succeed and the second should fail.

As already mentioned, it is possible to use a single regex -- but for a beginner, it may be a good idea to avoid that until you are more comfortable with regexes. At least, to the point where you know what "negative look ahead" means.

Henry
 
Ricky Murphy
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you Daniel and Henry.

I think that I know what I did wrong (but not why). I started with the "positive" comparison:
a). begins with A and followed by 6 digits
b). AND has the word 2008 in it.

So i ended up with regex: ^A\d{6}\s.*(2008).*$ and ( ^A\d{6}\s.*((2008).)*$ works too )

While above worked for me, I thought the (?!2008) would give what I originally wanted ( i.e. the regex: ^A\d{6}\s.*(?!2008).*$ ). Wrong, not the case, with added ?! I still got a match.

By comparing to your solution, you didn't have the "\s.*" in your regex. So i removed that from my version of the positive match. And the positive match stopped working. So, the conclusion, for a positive match, I need to have \s.* after{6}; while to do a negative match with ?!, I need to remove \s.* . Could you help me out why?

Not sure if I explained it clearly.

Thank you,

-RickM
 
Henry Wong
author
Posts: 23956
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ricky Murphy wrote:
I think that I know what I did wrong (but not why). I started with the "positive" comparison:
a). begins with A and followed by 6 digits
b). AND has the word 2008 in it.

So i ended up with regex: ^A\d{6}\s.*(2008).*$ and ( ^A\d{6}\s.*((2008).)*$ works too )




I still think that you should do it as two different regexes, instead of trying to do both with one regex. It isn't as easy as you think it is.

Henry
 
Henry Wong
author
Posts: 23956
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ricky Murphy wrote:
While above worked for me, I thought the (?!2008) would give what I originally wanted ( i.e. the regex: ^A\d{6}\s.*(?!2008).*$ ). Wrong, not the case, with added ?! I still got a match.



First of all, it is a negative look ahead. It is not a negative match..... ie. "(?!2008)" is not the opposite case of "(2008)".

But to answer why it always matches.... the first ".*" will basically greedily take the whole string, leaving the negative look ahead to always not match (meaning succeed), and the last ".*" to always have a zero length match.

Henry
 
Ricky Murphy
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

But to answer why it always matches.... the first ".*" will basically greedily take the whole string, leaving the negative look ahead to always not match (meaning succeed), and the last ".*" to always have a zero length match.



Thank you Henry. I will certainly give some thought on breaking it into two regexs. It sure will be more clearer for a later on revisit.

First of all, it is a negative look ahead. It is not a negative match..... ie. "(?!2008)" is not the opposite case of "(2008)".



You are right that the (2008) is not the opposite of (?!2008). Then, is the (?=2008) the opposite of (?!2008)? While ^A\d{6}((?!2008).)*$ gives me good negative filtering,
^A\d{6}((?=2008).)*$ does NOT give me any positive findings (I tried other combinations too, such as ((?=(2008)), etc.). Could you please help me on on this? I am not very clear on the look ahead mixed with the regular expression.

-RickM
 
Henry Wong
author
Posts: 23956
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ricky Murphy wrote:
While ^A\d{6}((?!2008).)*$ gives me good negative filtering,



^A\d{6}((?!2008).)*$ -- is looking for an "A" followed by six digits... followed by zero or more of any character, with the qualification that at the point of the character, it must not be a "2008" at that location. Of course, this means that, after the first 7 places, it will match any string, as long as there isn't a "2008" at any location.

Ricky Murphy wrote:
^A\d{6}((?=2008).)*$ does NOT give me any positive findings



^A\d{6}((?=2008).)*$ -- is looking for an "A" followed by six digits... followed by zero or more of any character, with the qualification that at the point of the character, it must be a "2008" at that location. Of course, this means that, after the first 7 places, it will match any string, as long as there is a "2008" at every location. And since it is not possible to have "2008" at every location, the only string that will work is when it is a blank string.

Ricky Murphy wrote:
I tried other combinations too, such as ((?=(2008)), etc.



The extra parens specifies another group -- which can be used to capture the string in that group.... but since the string is always "2008", it doesn't really make much sense to capture it. If it matches, you know what it is.

Henry
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ricky Murphy wrote:Thank you Daniel and Henry.

I think that I know what I did wrong (but not why). I started with the "positive" comparison:
a). begins with A and followed by 6 digits
b). AND has the word 2008 in it.


OK, first off I agree with Henry: it's far better to do it in two separate passes. For one thing, it's clearer, for the second, the search for "2008" probably doesn't warrant a regex anyway.

Secondly, the regex that will find A + at least 6 digits is "A\\d{6,}", and furthermore it will eliminate anything that appears in that initial 'code number' from the search (I presume you don't want to match '2008' in that).

Have you tried "^A\\d{6,}.*\\b2008\\b.*$"? That's what I'd do for a matches() check, but you can probably do a lot better if you use java.util.regex.Pattern and java.util.regex.Matcher (again, if you really think you need to).

Winston


 
Ricky Murphy
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you guys for all your valuable suggestions. I learned a lot indeed.

-RickM
 
Don't listen to Steve. Just read this tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic