• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regular Expression

 
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a String that contains HTML data.

I need to achieve 2 things
(1) If <script> and </script> exist in String, remove between and including.
(2) If <script> exist but</script> doesn�t, then remove all after <script> and including.

How to do this?
 
High Plains Drifter
Posts: 7289
Netbeans IDE VI Editor
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This seems more like an intermediate topic to me.
 
Ranch Hand
Posts: 2412
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This should be very easy with indexOf and substring
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What have you tried so far?
 
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This assumes your tags don't have any unnecessary whitespace in them, e.g., between the "<" and the tag name. If that's a possibility, you'll need this even uglier version:
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
[Alan]: This assumes your tags don't have any unnecessary whitespace in them,

If that's a possibility, shouldn't we simply find the author of the HTML and beat them with a stick? Such so-called HTML won't fly with most browsers anyway, I think. If we're going to worry about such possibilities, why not worry about the ones that are actually legal, such as

<script foo='bar>baz'>

or

<foo bar="b>az">

or

<script>
<!--
</script>
-->
</script>

Admittedly, handling all these cases is nontrivial and quite possibly unnecessary for what Chetan needs here. But if we want to delve into special cases, there are things somewhat more important to worry about than whitespace after a <, I think.
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yeah, now I think about it, whitespace before the tag name isn't very likely (I shouldn't post when I'm in a hurry to be elsewhere). But I do see extra space before the ">" fairly often, even in end tags. And, although it's legal to have a ">" in a (quoted) attribute value, I never see that (HTML authors just can't seem to believe it's legal). I've seen your third example (with the double end tags) once, but only because the page came up blank and I viewed the source to find out why (it may be legal, but it's almost certain to be a mistake). As you say, though, this is probably all moot anyway.
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I should have quoted you more carefully - I meant to respond specifically to the suggestion that there might be whitespace right after the <. I disagree with that (as have you, now), but I fully agree there can be whitespace before the >.

As for < in an attribute value - I've never seen that in HTML either. But I have seen it in some XML, e.g. something like

<field name='foo' value='some arbitrary text, could be just about anything'/>

If value contains a quote, that would need to be escaped - but I think anything else is fair game there.

Anyway, sorry for the digressions Chetan - this is probably not stuff you need to worry about for HTML. Unless you need a really robust product able to handle any valid HTML.
reply
    Bookmark Topic Watch Topic
  • New Topic