• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Devaka Cooray
  • Knute Snortum
  • Paul Clapham
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Frits Walraven
  • Ganesh Patekar
  • Tim Holloway
  • salvin francis

Regular Expressions and String replacements  RSS feed

Posts: 3353
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm looking for ideas for a program that I'm modifying that does an edit function on groups of files. The data is html and the edits are to change the HREF= that are to a server engine as query strings to be changed to local references. For example HREF="thesite/engine.php?id=22&topic=345". I want to change this to HREF="Topic345/22.html".

I'm trying to use regex to solve this. The edit program gets its edit rules from a file. Up to now the edits have been simple replacements. Now I have pages that are more complicated.

Here's what I want to do. I'll use cap letters instead of URL strings.

Source data: ABC997DEF2G
Desired output: MNOP2YZ997QX

The pattern for matching would be: ABC\d{1,3}DEF\d{1,2}G
The pattern has five parts: constant, skip decimal, constant, skip decimal and constant

The replacement string for the above would be: MNOP<skip2>YZ<skip1>QX
Where <skip2> is the value skipped by \d{1,2} and <skip1> the value skipped by \d{1,3}.
The replacement rules would be: ABC by MNOP, DEF by YZ and G by QX.
These could be placed in an array: rep[]

When Pattern.matcher() finds a match, the start() and end() allow me to extract the area to work on.
If the skipped strings were in an array skip[] then the output record would be build by:
String outputRec = input.substring(0, matcher.start())
+ rep[0] + skip[1] + rep[1] + skip[0] + rep[2]
+ input.substring(matcher.end());

So how to do this?

The pattern would find the string and then use substring to extract the various parts of the string.
How to get the variable parts of the string that were skipped by \d{...}?

What would the rules for my edit program look like? These are input to my program.
For example:
Find: ABC\d{1,3}DEF\d{1,2}G
Use \ and } as delimiters for the variable part of the pattern. Find them with String.indexOf().
Replace: MNOP\<skip2>YZ\<skip1>QZ
Use \< and > as delimiters for the data matched by the variable part of the pattern.

Thanks for any ideas,
Posts: 23832
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Norm Radder
Posts: 3353
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. I knew it was easy.
I missed the $1 variable in String.replaceAll() method.
machines help you to do more, but experience less. Experience this tiny ad:
how do I do my own kindle-like thing - without amazon
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!