• Post Reply Bookmark Topic Watch Topic
  • New Topic

String.replace is to \ as LISP is to ()?  RSS feed

 
Joe Areeda
Ranch Hand
Posts: 334
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greetings,

I'm missing something very basic.

My problem is I don't have a clue why I need so many backslashes in a String.replace call.

I'm using a Javascript editor (tinymce) to add help messages to an application. In order to set the initial value of edit area I need a little javascript to set a string to a multiline constant. So I have to escape the \r\n that comes out of the editor with a backslash before the newline character. And replace any single quotes in the text with \'

The statement that works is:


The question is why do I need all the \'s around \n? I suspect the answer is these strings are getting "unescaped" multiple times but I would like to understand how many times and where that is happening, so I next time I can guess instead of writing a test program and keep adding another one until it works.

Thanks,

Joe
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joe Areeda wrote:
My problem is I don't have a clue why I need so many backslashes in a String.replace call.

I'm using a Javascript editor (tinymce) to add help messages to an application. In order to set the initial value of edit area I need a little javascript to set a string to a multiline constant. So I have to escape the \r\n that comes out of the editor with a backslash before the newline character. And replace any single quotes in the text with \'

The statement that works is:


The question is why do I need all the \'s around \n? I suspect the answer is these strings are getting "unescaped" multiple times but I would like to understand how many times and where that is happening, so I next time I can guess instead of writing a test program and keep adding another one until it works.


Basically, there are two things going on...

First, the backslash has special meaning in a regex pattern, and in a regex replacement string. This means that any backslash needs to be escaped; if you want it to be a literal backslash, in either the pattern or the replacement string, you just need to escape it.... and if you had stored this pattern or replacement string somewhere, like a file, then that is all you needed to do.

However, and second, you did not get these strings from a file. You used a string literal. And with a string literal, a backslash has special meaning. So, this means that you need to escape the literal backslash.

So, the literal backslash had to be escaped, becoming two backslashes, for the regex engine to understand. And each of the two backslashes has to be escaped, becoming four backslashes, so that they can be represented by the string literal.

Henry


PS... and I may be starting a flame war here. I have seen cases where there wasn't enough escapes, yet, the regex still works. And the reason it worked was because some binary characters (such as tabs or LF) works in a regex -- so even though it wasn't escaped correctly, and hence, sending in the binary character, instead of the string readable version, it still works.
 
Joe Areeda
Ranch Hand
Posts: 334
2
Java Netbeans IDE Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry!

I think I get it now.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joe Areeda wrote:The question is why do I need all the \'s around \n?

It is a major pain, that's for sure; especially when you're dealing specifically with backslash characters.

One possibility: If these paths only need to work in Java, you don't need backslashes. They are specifically (and only, AFAIK) a Windows requirement; pretty much every other system on the planet uses forward slashes, and Java doesn't care which you use.

Also: there are several characters that have special meanings in regex expressions, but for many of them there is an alternative to escaping: square brackets.

For example, if you want to replace a '$'-sign, there are two ways to do it:
someText.replaceAll("\\$", "dollar");
or
someText.replaceAll("[$]", "dollar");
I have a slight preference for the latter, because I don't have to remember all those damn repeated backslashes, and it's actually more "visual" to me. You can only do it in the left-hand String though.

HIH

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!