• Post Reply Bookmark Topic Watch Topic
  • New Topic

String ReplaceAll regex problem  RSS feed

 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've extracted some HTML data into a string. I now want to remove all <sup> tags and the data between them.

For example:
abc<sup>(<a href="#junk" title="See cross-reference AM">AM</a> </sup>def

should become:
abcdef

I've tried to use replaceAll but I'm not very knowledgeable with regex patterns. So far I've tried "<sup>.*</sup>" but that does not work. Why? Because I have many <sup> tags in the data extracted and it matches the opening with the first one and the closing with the last occurrence. All data between gets erased

Is there any way to suggest that the inner pattern (currently .*) may be anything BUT </sup> ? That would work for me (it would then match the very first closing tag). I'm not sure how to add such an exclusion.

Thanks
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try

<sup>.*?</sup>

The '?' modifies the previous '*', so that it will match using the smallest number of repetitions (the smallest number of '.', which is any character) that will allow the whole pattern to match.
 
Steve Buck
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wow you saved my day

Thanks
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!