• Post Reply Bookmark Topic Watch Topic
  • New Topic

capturing group, regexp  RSS feed

 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I searched here and in other forums without success, and don't get it by reading the java-docs.
In a recent thread, Max told 'yes, capturing groups work in "replaceAll"', but didn't showed my error.

I want to split a certain token like this:
<name>groovy</name>
to
<name>
groovy
</name>
(xml-tag, content, xml-tag)

I use string.replaceAll (pattern, matcher) like this:
(the typical hardly readable regex-syntax)

the result isn't - as intended - but:

The pattern is found, but the backreferences aren't interpreted the way I like.
How would I specify the capturing-groups 1, 2, 3 correctly?
 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Found it myself (in JDC Tech Tips):

Even after knowing this, I don't find it in the Javadocs...
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Even after knowing this, I don't find it in the Javadocs...

Start at the API for replaceAll() in String:
An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression

    Pattern.compile(regex).matcher(str).replaceAll(repl)

This links to the API for replaceAll() in Matcher, which says:
The replacement string may contain references to captured subsequences as in the appendReplacement method.

So go to the appendReplacement() method:
The replacement string may contain references to subsequences captured during the previous match: Each occurrence of $g will be replaced by the result of evaluating group(g). The first number after the $ is always treated as part of the group reference. Subsequent numbers are incorporated into g if they would form a legal group reference. Only the numerals '0' through '9' are considered as potential components of the group reference. If the second group matched the string "foo", for example, then passing the replacement string "$2bar" would cause "foobar" to be appended to the string buffer. A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).

There you go. Note that all three methods also contain the following note in their APIs:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.

OK, that's true, and it's good to beware that replacement is not literal, but the language here is too vague to really tell you how it works. "Dollars signs may be treated as references"? :roll: To get useful details, you need to follow the links to appendReplacement() as I showed above.
 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks,
next time I will find it.

The last paragraph is causing headaches to me:
'backslashes are used to escape literal characters'

Since when I have to escape literal characters? Perhaps in specific replacement strings? Well - I'm not that much interested (since I solved my problem) to get these headaches. Shall it mean whatever it means!

thanks again...

P.S.: (OT) I read some thread, where somebody wrote 'thanks to Jim', dealing about ubb-code. Are you responsible for the ubb-code - integreation?
I'm using a german-speaking java-forum, where they implemented a syntax-highlightening for java. Though I don't think, that we really need it, it's a nice thing; maybee you want to have a look at it:
http://forum.javacore.de/viewtopic.php?t=668
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since when I have to escape literal characters?

Ummm... since they created this method and told us in the API that backslashes and dollar signs would not (err, may not :roll: ) be treated as literal replacements? It's inconvenient, but at least it's documented.

Perhaps in specific replacement strings?

Yes, specifically in any replacement string containing $ or \, you need to escape those chars as \$ and \\ respectively.

Note that in JDK 1.5 the Matcher class has a new static quoteReplacement() method to make this easier.

Are you responsible for the ubb-code - integreation?

Well, sorta. I have access to the code base we're running, and I sometimes make changes to it. (Which sometimes leads to unanticipated painful side-effects forcing me to change things back.) But it's a commercial package bought a long time ago, written in Perl, not very well-factored for readability, with no available unit test. So we tend to severely limit the changes we make to it. We've got an ongoing project for a message board system written in Java to replace this; several of the bartenders here have been very busy with this. Syntax highlighting is definitely a part of this system (once some other issues are resolved.)
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!