• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Easy Regular Expressions Question

 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have this pattern:
<b>(.+?)</b>.*?<td valign=top>(.+?)<br>(.*?)<br><small>.*?<td valign=top><nobr>(.+?)</nobr>
but I need this part:
<br>(.*?)
to be optional. In other words, that whole piece may be missing. For example, I could encounter 1<br>2 or just 1 at that point.
How should I code that part?
Thanks.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Do you need to capture the taxt after that optional <br> if it's there? If not, the simplest solution is just to ignore that one <rb> entirely, and just look for the <br><small> part:
<b>(.+?)</b>.*?<td valign=top>(.+?)<br><small>.*?<td valign=top><nobr>(.+?)</nobr>
Even if you want to know the details of whether the <br> is there or not - you can test group 2 afterwards, in a separate regex. You don't have to do everything in one expression - it can be to confusing.
Here's another alternative - I inserted whitespace and comments for clarity; compile with Pattern.COMMENTS so these will be ignored.

I think this will lead to a lot of backtracking which may impact performance. Better may be the following, using negative lookahead:

[ August 05, 2003: Message edited by: Jim Yingst ]
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
can't you nest braces like this:
(<br>(.*?))?
?
[ August 05, 2003: Message edited by: Stefan Wagner ]
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
can't you nest braces like this:
(<br>(.*?))?

Yeah. When I first considered that possibility I was thinking that this would not interact well (efficiency-wise) with the (.+?) immediately before it. I distrust having too many *'s, +'s and ? nested in close proximity, as they often do strange things. But as I think more about this specific example I guess it's OK.

I still like my very first answer best though.
[ August 05, 2003: Message edited by: Jim Yingst ]
 
William Wagers
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the suggestions, but neither one works for me. Both solutions, leave a <br> in the data, and/or get lost, presumably because of the <br><small> which follows.
I was a little confused by the extra group(s). I was able to make it work for me. Thanks.
[ August 08, 2003: Message edited by: William Wagers ]
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic