• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

A regex question

 
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What should this expression evaluate to?

"000hello".replaceAll("^(?=0)0", "&");

Java 1.6 evaluates it to "&00hello", which I think is incorrect. Just need a second pair of eyes!

While I am at it... given a String "00043.30", what's the easiest way to format it to "&&&43.3& "? (It's really going to be spaces, but this forum filters them away). Using any means available in Java, such as regex, DecimalFormat, etc. The most straightforward way (with character tweaking) does not look very elegant.




 
Ranch Hand
Posts: 188
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't know what application needs to convert zero to '&' rather than removing them. In your pattern "^(?=0)0", first character can be '0' or not at all, then second character should be '0', then followed by whatever. So, in your test case, first zero preceded by nothing matches and is replaced by '&'. If you want the output like "&&&43.3&" then pattern you are looking for is to match any zero and replace that with '&'
 
Jane Jukowsky
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Like I said, I am really replacing with spaces, not "&", but this bulletin board is not very kind to spaces.

In your pattern "^(?=0)0", first character can be '0' or not at all


That's not what my tutorial says: "Positive lookahead works just the same. q(?=u) matches a 'q' that is followed by a 'u', without making the 'u' part of the match. "

(http://www.regular-expressions.info/lookaround.html)

In any case, why is not thwe 2nd zero replaced?



If you want the output like "&&&43.3&" then pattern you are looking for is to match any zero and replace that with '&'


not any zero, only leading and trailing zeroes. Think "00403.30" ==> "&&403.3&"
 
Jane Jukowsky
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here is how to get rid of the leading zeroes:

"000hello".replaceAll("(?<=^0*)0", "&");

Is it efficient? (though I am not using very long numbers, so it's just a theoretical question)
 
Rahul P Kumar
Ranch Hand
Posts: 188
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Oh I am sorry to ignore that it is "(?=0)" and not "(0?)". in your link

You can use any regular expression inside the lookahead. (Note that this is not the case with lookbehind. I will explain why below.) Any valid regular expression can be used inside the lookahead. If it contains capturing parentheses, the backreferences will be saved. Note that the lookahead itself does not create a backreference. So it is not included in the count towards numbering the backreferences. If you want to store the match of the regex inside a backreference, you have to put capturing parentheses around the regex inside the lookahead, like this: (?=(regex)). The other way around will not work, because the lookahead will already have discarded the regex match by the time the backreference is to be saved.



So first Zero matched, second zero also matched but then backreference to first zero is lost. Overall match was successful, however. After that any amount of Zero reports failure because of boundary matcher '^' and thus only one zero of the match was replaced. Howwwaa, that was the explanation out of blue and needs rationalization.
 
Jane Jukowsky
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are right. My mistake was using a lookahead instead of a lookbehind.

Here is the code that does what I need:
 
Author
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
FYI, the forum doesn't "filter spaces away". That's just the way HTML works.
 
Jane Jukowsky
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, it could have escaped the spaces, no? I just have discovered that it does so in the [ code ] section, so perhaps that's what I should have used.
 
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jane Dodo wrote:What should this expression evaluate to?

"000hello".replaceAll("^(?=0)0", "&");

Java 1.6 evaluates it to "&00hello", which I think is incorrect. Just need a second pair of eyes!



Not surprisingly, Java is correct. The ^ meta character means "the start of the string". Now, there is only one zero at the start of the string, so only one is replaced.

Jane Dodo wrote:While I am at it... given a String "00043.30", what's the easiest way to format it to "&&&43.3& "? (It's really going to be spaces, but this forum filters them away). Using any means available in Java, such as regex, DecimalFormat, etc. The most straightforward way (with character tweaking) does not look very elegant.



So you want to replace all leading and trailing zeros? Here's a regex solution:



As you see, the middle zero's are not replaced. But that will most probably look like voodoo to you. Better do it manually.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jane Dodo wrote:You are right. My mistake was using a lookahead instead of a lookbehind.

Here is the code that does what I need:



That does not work for two reasons:
1 - Java does not support look behinds with infinite repetition (so no '*' and '+' inside look behinds!);
2 - If the infinite look behinds DID work, the first 0 in your string would have been replaced, but the second zero would not be replaced because it would not have a 0 at it's left (because you justed replaced it!).
 
Marshal
Posts: 80254
428
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try "&nbsp;", which is non-breaking space and looks like this:     and see that happens to your spaces. Of course you can't tell by looking how many spaces there are (1 ordinary, 3 non-breaking, and 1 ordinary, in fact).
 
Jane Jukowsky
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


That does not work for two reasons:
1 - Java does not support look behinds with infinite repetition (so no '*' and '+' inside look behinds!);
2 - If the infinite look behinds DID work, the first 0 in your string would have been replaced, but the second zero would not be replaced because it would not have a 0 at it's left (because you justed replaced it!).



Don't know, seems to work fine for me!
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jane Dodo wrote:


That does not work for two reasons:
1 - Java does not support look behinds with infinite repetition (so no '*' and '+' inside look behinds!);
2 - If the infinite look behinds DID work, the first 0 in your string would have been replaced, but the second zero would not be replaced because it would not have a 0 at it's left (because you justed replaced it!).



Don't know, seems to work fine for me!



Well I'll be damned, it does!

Here's what regex-advice.info says: "Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths.".
-- http://www.regular-expressions.info/lookaround.html

And about point 2: I truly thought that the regex engine performed it's replacements from left to right and that these replacements influenced the characters to the right of it.

Interesting, and thank you for following up: I'm going to see if I can find out if perhaps some things have changed lately.

Regards,

Piet.
 
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Verdriet wrote:
Interesting, and thank you for following up: I'm going to see if I can find out if perhaps some things have changed lately.



I don't know if it "changed lately", or was always like this... but I thought the infinite repetition restriction in look-aheads and look-behinds only applied to the split() method.... meaning... I always remember being able to use * and + in look-aheads and look-behinds, since regex was introduced in Java 1.4 (as long as they are not used in the split() method).

Piet Verdriet wrote:And about point 2: I truly thought that the regex engine performed it's replacements from left to right and that these replacements influenced the characters to the right of it.



Never thought about this... But it does make sense that it will work though. Strings are immutable. And under the covers, the replaceAll() uses the appendReplacement() and appendTail() methods, which uses a separate string buffer to create the result string.

Henry
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

Piet Verdriet wrote:
Interesting, and thank you for following up: I'm going to see if I can find out if perhaps some things have changed lately.



I don't know if it "changed lately", or was always like this... but I thought the infinite repetition restriction in look-aheads and look-behinds only applied to the split() method.... meaning... I always remember being able to use * and + in look-aheads and look-behinds, since regex was introduced in Java 1.4 (as long as they are not used in the split() method).

Piet Verdriet wrote:And about point 2: I truly thought that the regex engine performed it's replacements from left to right and that these replacements influenced the characters to the right of it.



Never thought about this... But it does make sense that it will work though. Strings are immutable. And under the covers, the replaceAll() uses the appendReplacement() and appendTail() methods, which uses a separate string buffer to create the result string.

Henry



Not sure about the split(...), I'll look into that.
About look-aheads: AFAIK, that has always worked with both + and *, it was only the look-behinds that were restricted in Java (and many other languages for that matter).
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

Piet Verdriet wrote:
Interesting, and thank you for following up: I'm going to see if I can find out if perhaps some things have changed lately.



I don't know if it "changed lately", or was always like this... but I thought the infinite repetition restriction in look-aheads and look-behinds only applied to the split() method.... meaning... I always remember being able to use * and + in look-aheads and look-behinds, since regex was introduced in Java 1.4 (as long as they are not used in the split() method).



It doesn't matter what what method you use, matches(), replaceAll() and split() all produce the same output. But it gets a bit strange (in my opinion). See the test below:



When you run this test, you'll see that 1, 2 and 3 run without a hitch, yet 4, 5 and 6 produce exceptions...

Henry Wong wrote:

Piet Verdriet wrote:And about point 2: I truly thought that the regex engine performed it's replacements from left to right and that these replacements influenced the characters to the right of it.



Never thought about this... But it does make sense that it will work though. Strings are immutable. And under the covers, the replaceAll() uses the appendReplacement() and appendTail() methods, which uses a separate string buffer to create the result string.

Henry



That sounds reasonable.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For an explanation of the * and + being sometimes valid and sometimes invalid inside a look-behind, see: http://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java
 
Henry Wong
author
Posts: 23958
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Verdriet wrote:For an explanation of the * and + being sometimes valid and sometimes invalid inside a look-behind, see: http://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java



That's actually a very enlightling article, Piet. Thanks...

Henry
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Henry Wong wrote:

Piet Verdriet wrote:For an explanation of the * and + being sometimes valid and sometimes invalid inside a look-behind, see: http://stackoverflow.com/questions/1536915/regex-look-behind-without-obvious-maximum-length-in-java



That's actually a very enlightling article, Piet. Thanks...

Henry



I thought you would.
You're welcome of course!
 
Jane Jukowsky
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

An interesting article. Any idea WHY Java does it that way? I mean, why not just count bytes as it goes and throw an exception in an unlikely event when the matched string (or whatever) is longer than Integer.MAX_VALUE characters? But then, I am a regex newbie.

 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jane Dodo wrote:
An interesting article. Any idea WHY Java does it that way? I mean, why not just count bytes as it goes and throw an exception in an unlikely event when the matched string (or whatever) is longer than Integer.MAX_VALUE characters? But then, I am a regex newbie.



Good point. I was wondering the same. It seems this is an accepted bug*. When a * or + is used inside a character class, an exception should be thrown! @OP: I'd advice against your solution and have a look at my earlier suggestion (the one with the \G in it).

http://bugs.sun.com/view_bug.do?bug_id=6695369
 
Jane Jukowsky
Ranch Hand
Posts: 157
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Verdriet wrote:
! @OP: I'd advice against your solution and have a look at my earlier suggestion (the one with the \G in it).



Does not work for me, as I am only to replace leading and trailing (decimal point) zeroes, i.e. 00012003.40 should become 12003.4.
BTW, if there is a formatter that does that (i.e. in the prcess of cconverting a double to a String) that would be even cooler.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jane Dodo wrote:

Piet Verdriet wrote:
! @OP: I'd advice against your solution and have a look at my earlier suggestion (the one with the \G in it).



Does not work for me, as I am only to replace leading and trailing (decimal point) zeroes, i.e. 00012003.40 should become 12003.4.
BTW, if there is a formatter that does that (i.e. in the prcess of cconverting a double to a String) that would be even cooler.



Sorry Jane, I forgot you were the OP.

I don't follow you exactly, you wanted to replace "000400003.300" with "&&&400003.3&&" where the '&'-s are white spaces, right? If so, then my earlier suggestion does exactly that:



AFAIK, there is no Formatter in Java that does exactly as the above.
 
snakes are really good at eating slugs. And you wouldn't think it, but so are tiny ads:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic