• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Pattern matches but never replaces

 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

This code works to find the JSTL tag pattern:




But no pattern substitution takes place. I do not understand why this
fails to replace the pattern but successfully finds the pattern. Any
ideas why? Is this more proof that Java can't handle regular
expressions?

[edit]Disable Smilies. Don't know whether I have removed the spaces correctly. CR[/edit]
[ August 25, 2008: Message edited by: Campbell Ritchie ]
 
Ranch Hand
Posts: 624
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm surprised you are not getting an IllegalArguementException thrown on your replacement string:



You have an illegal group reference in it. Remember, Java uses the '$' character to escape group references. So the second '$' is telling Java to expect a group reference, but instead it is getting an opening brace.

I'm assuming you are trying to change



to



If so, you need to escape the '$' before the opening brace. So your replacement String becomes



Then when you put that in a Java String, you need to escape the '\', so it becomes:



See if that works for you.
 
Mark Vedder
Ranch Hand
Posts: 624
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I just noticed in my reply (the output of which came from running your code), that there's a space in the variable expansion -- after 'myValue' and the closing '}'.



You'll probably want to alter your regex so that capturing group does not catch any trailing whitespace characters after the value attribute's value. So that you end up with:

 
Phil Powell
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I thought of that and tried escaping the "$", to no avail. Still couldn't replace!
 
Mark Vedder
Ranch Hand
Posts: 624
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The following is working for me:



Output:

 
Phil Powell
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks, it works now but very poorly.



Your suggestion about the backslashes only partially works in Java. It works only on the very last instance of the pattern found, even if I use replaceAll(). Put it in a continuous loop and it will clean up every instance, backwards, from the last to the first.

[edit]Disable smilies. I don't know whether I have deleted the correct spaces. CR[/edit]
[ August 25, 2008: Message edited by: Campbell Ritchie ]
 
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just an obvious question, but you are writing the stuff back to the file, correct?
 
Phil Powell
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes I am.

I found that the code I gave you doesn't consistently work across all instances of the <c:param> JSTL tag with the given pattern, but this one works and I have no idea why:



Difference is the pattern has to apparently be WRONG for Java for it to work combined with searching in the wrong direction via replaceFirst() within a find() while loop.

[edit]Disable Smilies. CR[/edit]
[ August 25, 2008: Message edited by: Campbell Ritchie ]
 
Steve Luke
Bartender
Posts: 4179
22
IntelliJ IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It is like I said In the Sun Java Forum Thread you have: your REGEX is being too greedy. That .+ claims everything it possibly can. You need to be less greedy.

One solution is to force the first group to end when the '<' character is found first after the <c:param and before >. An example is to use [^<]+ instead of .+


[ August 25, 2008: Message edited by: Steve Luke ]
 
Mark Vedder
Ranch Hand
Posts: 624
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
{Note: I was typing up my reply as Steve posted his and therefore did not see it until I was done. So there's a little bit of redundancy. Great minds think alike and all that... }

Originally posted by Phil Powell:
Thanks, it works now but very poorly.

Your suggestion about the backslashes only partially works in Java. It works only on the very last instance of the pattern found, even if I use replaceAll(). Put it in a continuous loop and it will clean up every instance, backwards, from the last to the first.



That has nothing to do with using an escaping a character in the replacement string. That has to do with the regex itself. If you put a logging statement in, you will see that the find method is only finding the last line. You need to
indicate your pattern is multiline:



That will at least find each pattern that occurs on a separate line. But it will still identify this line:



all as a single match and not two matches.

As such
  • $1 = <c :p aram name="myName" value="<%= myValue %>" /> <c :p aram name="foo" value="
  • $2 = bar <-- with a trailing space
  • $3 = " />


  • So the end of the first desired match and the start of the second desired match is matching the regex for the first capturing group, namely the .+

    If we change that to a [^>]+ then we will only capture the desired start, but we'll also need to modify the third capturing group:

  • $1 = <c :p aram name="myName" value="
  • $2 = myValue <-- with a trailing space
  • $3 = " /> <c :p aram name="foo" value="<%= bar %>" />


  • Now, the .* in the third group is capturing the second desired match. So we can change that to ("[^>]+> ;) and everything will work. Also, the [ \t]* should probably be changed to [\s]* since a carriage return in the middle of an XML element is legal as well as a space or tab.

    Also, the regex does not take into account single quoted attribute values. (value = 'myValue'). So we want to change teh " to an option of ' or ".

    So adding all our changes together we get:


    That will do more along the lines of what you want. May not be perfect, I'd do some unit testing on it to check for other missed options.

    There's a great tool available to help with writing regular expressions. It's called RegexBuddy. It does a lot. One of the nicest features is the ability to highlight not only the match, but what a capturing group is matching. It's a commercial product, but well worth the cost as it will save you hours of work.
    [ August 25, 2008: Message edited by: Mark Vedder ]
     
    Ranch Hand
    Posts: 266
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Originally posted by Mark Vedder:
    {Note: I was typing up my reply as Steve posted his and therefore did not see it until I was done. So there's a little bit of redundancy. Great minds think alike and all that... }



    That has nothing to do with using an escaping a character in the replacement string. ...
    [ August 25, 2008: Message edited by: Mark Vedder ]



    But has everything to do with the poor performance the OP is talking about in reply #21. Always watch out for those greedy .* and .+ things in a regex!
     
    Phil Powell
    Greenhorn
    Posts: 26
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Please break this down to a simpler level; I can't follow anything you posted whatsoever, sorry

    Originally posted by Mark Vedder:
    {Note: I was typing up my reply as Steve posted his and therefore did not see it until I was done. So there's a little bit of redundancy. Great minds think alike and all that... }



    That has nothing to do with using an escaping a character in the replacement string. That has to do with the regex itself. If you put a logging statement in, you will see that the find method is only finding the last line. You need to
    indicate your pattern is multiline:



    That will at least find each pattern that occurs on a separate line. But it will still identify this line:



    all as a single match and not two matches.

    As such

  • $1 = <c:param name="myName" value="<%= myValue %>" /> <c:param name="foo" value="
  • $2 = bar <-- with a trailing space
  • $3 = " />


  • So the end of the first desired match and the start of the second desired match is matching the regex for the first capturing group, namely the .+

    If we change that to a [^>]+ then we will only capture the desired start, but we'll also need to modify the third capturing group:

  • $1 = <c:param name="myName" value="
  • $2 = myValue <-- with a trailing space
  • $3 = " /> <c:param name="foo" value="<%= bar %>" />


  • Now, the .* in the third group is capturing the second desired match. So we can change that to ("[^>]+> ;) and everything will work. Also, the [ \t]* should probably be changed to [\s]* since a carriage return in the middle of an XML element is legal as well as a space or tab.

    Also, the regex does not take into account single quoted attribute values. (value = 'myValue'). So we want to change teh " to an option of ' or ".

    So adding all our changes together we get:


    That will do more along the lines of what you want. May not be perfect, I'd do some unit testing on it to check for other missed options.

    There's a great tool available to help with writing regular expressions. It's called RegexBuddy. It does a lot. One of the nicest features is the ability to highlight not only the match, but what a capturing group is matching. It's a commercial product, but well worth the cost as it will save you hours of work.

    [ August 25, 2008: Message edited by: Mark Vedder ]


    [ August 27, 2008: Message edited by: Phil Powell ]
    [edit]Disable Smilies. CR[/edit]
    [ September 02, 2008: Message edited by: Campbell Ritchie ]
     
    Steve Luke
    Bartender
    Posts: 4179
    22
    IntelliJ IDE Python Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Instead of asking Mark to repost everything in more detail, you should ask specific questions. What don't you understand?

    What I think you will benefit from is thoroughly reading the API and a good Java/Regex tutorial.

    Your statement "break this down to a simpler level; I can't follow anything you posted whatsoever" looks to me like you are asking Mark to write a tutorial response, which isn't really practical on a forum, especially since the response Mark already wrote was so detailed.
     
    Piet Verdriet
    Ranch Hand
    Posts: 266
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    - f%$@1ng forum with it's smiley's! -
    [ August 27, 2008: Message edited by: Piet Verdriet ]
     
    Piet Verdriet
    Ranch Hand
    Posts: 266
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    : p

    Originally posted by Phil Powell:
    Please break this down to a simpler level; I can't follow anything you posted whatsoever, sorry
    ...



    Okay, you never gave an exact example in AND ouput, so I might be wrong, but if I understand you crrectly, you want to replace sub strings like this:

    <c: param name="AAA" value="<%= aaa %>" />

    into:

    <c: param name="AAA" value="aaa" />

    correct?

    If so, you don't need to create any Matcher: replaceAll(...) on a String is sufficient.

    Try this:



    The (?s) and (?m) flags in front of the regex, tell replaceAll(...) to let the DOT match any character and matches can span over multiple lines.
    [ August 27, 2008: Message edited by: Piet Verdriet ]
     
    Piet Verdriet
    Ranch Hand
    Posts: 266
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    - double post removed -
    [ August 27, 2008: Message edited by: Piet Verdriet ]
     
    Phil Powell
    Greenhorn
    Posts: 26
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Originally posted by Piet Verdriet:
    : p

    Okay, you never gave an exact example in AND ouput, so I might be wrong, but if I understand you crrectly, you want to replace sub strings like this:

    <c: param name="AAA" value="<%= aaa %>" />

    into:

    <c: param name="AAA" value="aaa" />

    correct?

    If so, you don't need to create any Matcher: replaceAll(...) on a String is sufficient.

    Try this:



    >> please explain ".*?" and ".+?", that sounds contradictory inasmuch as that sounds like "0 or more characters and 0 or 1 character" along with "1 or more characters and 0 or 1 character"; that does not make sense to me.

    >> Thanks for the explanation on ?s and ?m


    The (?s) and (?m) flags in front of the regex, tell replaceAll(...) to let the DOT match any character and matches can span over multiple lines.

    [ August 27, 2008: Message edited by: Piet Verdriet ]

     
    Piet Verdriet
    Ranch Hand
    Posts: 266
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Originally posted by Phil Powell:

    please explain ".*?" and ".+?", that sounds contradictory inasmuch as that sounds like "0 or more characters and 0 or 1 character" along with "1 or more characters and 0 or 1 character"; that does not make sense to me.



    I appears you know very little (none at all?) regex. If you needto know what these elementary building block from the regex language mean, you should do a basic tutorial:

    http://www.regular-expressions.info/tutorial.html
    [ September 02, 2008: Message edited by: Piet Verdriet ]
     
    keep an eye out for scorpions and black widows. But the tiny ads are safe.
    Smokeless wood heat with a rocket mass heater
    https://woodheat.net
    reply
      Bookmark Topic Watch Topic
    • New Topic