• Post Reply Bookmark Topic Watch Topic
  • New Topic

[RegEx] how can I search a string with 5 (or longer) consecutive numbers?  RSS feed

 
Matt Taylor
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys, I want to retrieve any words in which they have 5 consecutive numbers. Given the sample string:

"This is 12, 1234, 123456789, 1234567890, 12345q"



How can I retrieve "123456789", and "1234567890" and "12345q"?

I used the [^\d]\d{5}[^\d] but it only retrives exactly 5 consecutive numbers. Please I need your advice.

TIA
 
Tim Moores
Saloon Keeper
Posts: 4034
94
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using "\d{5,}" rather than "\d{5}" will find strings containing at least 5 digits rather than exactly 5 digits.

If suffixes like "q" are to be matched as well, then the condition "[^\d]" needs to be altered accordingly. Or there needs to be an additional optional string before it that matches letters (and whatever else might be there and should be found).
 
Tony Docherty
Bartender
Posts: 3271
82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you are including characters before and after the digits you could \b to find the word boundary and then \w* for 0 or more word characters ie \b\w*\d{5,}\w*\b
 
fred rosenberger
lowercase baba
Bartender
Posts: 12563
49
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
are you looking for consecutive NUMBERS or consecutive DIGITS?  "12345" is a single number with five digits.  "123  18   393723  345633  -39" is a string of five numbers.

I know i'm being a little pedantic here, but when you are writing specs, these kinds of things do make a difference.
 
Matt Taylor
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much for your response.

@fred rosenberger

I am looking for consecutive digits and include the letters in the matcher until a whitespace is encountered. In your example, I should retrieve
"393723" and  "345633" separately.

Just a follow up question please. Can someone please tell me how can I use pattern.matcher to iterate over the matched strings?
 
Campbell Ritchie
Marshal
Posts: 56553
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That is slightly different from what you said first. You can easily append something like \w* to the end of the regex.
Another way to do it might be to split the String on whitespace and iterate the resulting array.
 
Knute Snortum
Sheriff
Posts: 4281
127
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The basic loop structure for iterating over matches is:

It depends on exactly what your REGEX is and how it matches.
 
Knute Snortum
Sheriff
Posts: 4281
127
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Matt Taylor wrote:I am looking for consecutive digits and include the letters in the matcher until a whitespace is encountered. In your example, I should retrieve
"393723" and  "345633" separately

You will probably need to use groups for that.  Here I have broken up the starting of the regex so it can be commented:
 
fred rosenberger
lowercase baba
Bartender
Posts: 12563
49
Chrome Java Linux
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
so the hardest thing about regexes (at least in my opinion) is clearly defining exactly what you want to do.

I want to retrieve any words in which they have 5 consecutive numbers.
and
I am looking for consecutive digits and include the letters in the matcher until a whitespace is encountered.

are slightly different. And the term "word" is not well-defined either. how many words is "upside-down"?  or "in/out?" What if a word is hyphenated across a new line?  What about something like "abc12345def"


I think another issue is that often, people want to write a single, monoliths regex that does everything in one fell swoop. The issue i see there is that if the specs change, updateing that single regex is hard - damn hard.

I'm a big proponent of writing several little regexes that are all simple to understand that can then all be combined - the exact same way you'd write a program or even a complicated method.  So consider:

breaking apart the string on whitespace
test each token with whatever criteria
if all criteria met, include that in your final list of tokens.
 
Matt Taylor
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much for your input. I have a message (like a business letter of 5000 words long) that has some digits in it and i need to extract those. As long as the "word" in the message starts with 5 consecutive digits, regardless of it contains any special character or letters after the 5 consecutive digits, I need to retrieve those whole word until a whitespace or new line or tab or any invisible spaces after that word.

I'm really sorry i no expert in regex and I need help please.
 
Carey Brown
Saloon Keeper
Posts: 3323
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's a REGEX that I think would work for you. As you can see there's a test frame work that you can insert "good" and "bad" use cases into to verify if this meets your needs. If this doesn't work you'll need to be clearer on your requirements and supply additional good/bad use cases. (Regex's are a maintenance headache.)

Output

 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!