Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regex date capture - greed, reluctance, and precedence problem

 
Chris Treglio
Ranch Hand
Posts: 64
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I've got a long list of strings with dates in the format dd-mmm-yyyy which I'm trying to capture. I'd like to be able to handle missing leading zeroes in the day part (i.e. properly capture 01-Jan-2011 and 1-Jan-2011).

My current code doesn't handle leading zeroes.



I thought by changing the day part to ".*((?:[12][0-9]|3[01]|0?[1-9])-", I would make the leading zero "greedy" optional, and capture it if it's there. It does not. And furthermore, it turns dates like "12-Mar-2011" into "2-Mar-2011". Obviously, I'd want matches in the teens, twenties, or thirties to get captured too.

what am I doing?
 
Wouter Oet
Saloon Keeper
Posts: 2700
IntelliJ IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're trying to reinvent the wheel. Why use regex to parse dates when you can use DateFormat/SimpleDateFormat?
 
Chris Treglio
Ranch Hand
Posts: 64
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not really trying to turn a String into a Date object, I'm trying to pull out String dates from a longer String filled with other stuff. My String sources are like "bla bla blah blah 02-Apr-2011 blah bla blah".

Can you do that with the DateFormat/SimpleDateFormat?
 
Wouter Oet
Saloon Keeper
Posts: 2700
IntelliJ IDE Opera
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aha. I'm not sure if that is possible.
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm guessing the issue is that the initial .* is greedy, and that wins out over your other intentions here. It will probably overlook a leading 1, 2, or 3 as well, not just leading 0. As long as there's at least one digit after, to match the rest of the expression.

I suggest either:

(a) replace .* with .*?, which is reluctant

or

(b) drop the .* entirely, and replace matches() with find().
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic