• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regular Expressions and String replacements

 
Norm Radder
Bartender
Posts: 1331
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm looking for ideas for a program that I'm modifying that does an edit function on groups of files. The data is html and the edits are to change the HREF= that are to a server engine as query strings to be changed to local references. For example HREF="thesite/engine.php?id=22&topic=345". I want to change this to HREF="Topic345/22.html".

I'm trying to use regex to solve this. The edit program gets its edit rules from a file. Up to now the edits have been simple replacements. Now I have pages that are more complicated.

Here's what I want to do. I'll use cap letters instead of URL strings.

Source data: ABC997DEF2G
Desired output: MNOP2YZ997QX

The pattern for matching would be: ABC\d{1,3}DEF\d{1,2}G
The pattern has five parts: constant, skip decimal, constant, skip decimal and constant

The replacement string for the above would be: MNOP<skip2>YZ<skip1>QX
Where <skip2> is the value skipped by \d{1,2} and <skip1> the value skipped by \d{1,3}.
The replacement rules would be: ABC by MNOP, DEF by YZ and G by QX.
These could be placed in an array: rep[]

When Pattern.matcher() finds a match, the start() and end() allow me to extract the area to work on.
If the skipped strings were in an array skip[] then the output record would be build by:
String outputRec = input.substring(0, matcher.start())
+ rep[0] + skip[1] + rep[1] + skip[0] + rep[2]
+ input.substring(matcher.end());

So how to do this?

The pattern would find the string and then use substring to extract the various parts of the string.
How to get the variable parts of the string that were skipped by \d{...}?

What would the rules for my edit program look like? These are input to my program.
For example:
Find: ABC\d{1,3}DEF\d{1,2}G
Use \ and } as delimiters for the variable part of the pattern. Find them with String.indexOf().
Replace: MNOP\<skip2>YZ\<skip1>QZ
Use \< and > as delimiters for the data matched by the variable part of the pattern.

Thanks for any ideas,
Norm
 
Henry Wong
author
Marshal
Pie
Posts: 21489
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try...



Henry
 
Norm Radder
Bartender
Posts: 1331
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. I knew it was easy.
I missed the $1 variable in String.replaceAll() method.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic