• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Regular Expression: finding multiple lines

 
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello!

Does anyone knows how can I do a "search and replace" using REGEX, but I want to match several lines.

For example, I want to open a TXT file and find:



and replace it for:



Like "put" a comment on it.

Thanks!
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Most regexp libraries can perform multiline matching. Search the javadocs of the java.util.regex.Pattern class for "MULTILINE".
 
André Campanini
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, again.

I found a topic in this forum and I tested the code, but I think I don't know how to use some methods very well, like "replace all". Topic is: https://coderanch.com/t/411621/java/java/Pattern-matches-but-never-replaces

Using that code, when I look for "void myMethod(){}" works well, but if I change the text to multiple lines, it doesn't work.

"
void myMethod()
{
}
"

I'm posting the code that I tested.

 
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How will you be replacing a method like this:


[ October 07, 2008: Message edited by: Piet Verdriet ]
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
Most regexp libraries can perform multiline matching. Search the javadocs of the java.util.regex.Pattern class for "MULTILINE".



The MULTILINE option will only cause the regex engine to treat each line as if it was a "complete String" of it's own. So each line will have a ^ (beginning of String), some contents and ends with a $ (end of String).

What you're hinting at is probably the DOTALL option, causing the DOT will also match new-line characters?
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try something like this:



But it will still break in a lot of cases! Better use a true Java source parser. ANTLR is an impressively easy to use parser generator.

Good luck!
 
Ranch Hand
Posts: 781
Netbeans IDE Ubuntu Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Piet Verdriet:
Try something like this:


But it will still break in a lot of cases! Better use a true Java source parser. ANTLR is an impressively easy to use parser generator.



I agree. Using a regex is normally a very poor approach for parsing recursive syntax.
 
André Campanini
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello!

I got what I wanted doing this:

regex = "(?m)void myMethod\\(\\)\r\n\\{\r\n\\}"
String newFileContent = fileContent.replaceAll(regex, "/*"+regex+"*/");

It replaces every method with coments... just what I wanted. I don't know if it is just right using regex this way... but is working, now...

Regards!!!
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Andr� Campanini:
Hello!

I got what I wanted doing this:

regex = "(?m)void myMethod\\(\\)\r\n\\{\r\n\\}"
String newFileContent = fileContent.replaceAll(regex, "/*"+regex+"*/");

It replaces every method with coments... just what I wanted. I don't know if it is just right using regex this way... but is working, now...

Regards!!!



You don't need the (?m) flag.

Doesn't your method have a body? If not (your method always looks the same), you don't need regex for it: a simple String.replace(...) will do.

If there is a method body, try your approach with the following:
 
André Campanini
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The method HAS a body, I just didn't put it here this way in the example.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Andr� Campanini:
The method HAS a body, I just didn't put it here this way in the example.



Ok. Then my points still stand: you don't need the (?m) flag. And try it with a String with this contents:

or

or

To name just three out of many, many things that can go wrong.
 
Author
Posts: 836
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Take the advice given above: avoid regular expressions for this. If you need to change a method with just one signature (assuming it's provided by the user), you need to count braces and take care of certain exceptions (like when they're contained in strings as given above). The basic principle is simple though, and will probably be faster than regex:Is that really difficult? Surely easier than regex?

Usual disclaimer: I haven't tested this and it is incomplete, so you'll need to finish and/or bug fix it yourself.
[ October 13, 2008: Message edited by: Charles Lyons ]
 
André Campanini
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot for the tips! It will help me a lot besides clean my code!
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Charles Lyons:
...Is that really difficult? Surely easier than regex?
[/QB]



Err, you make it sound too easy. Where the tricky part starts, you have a comment saying "TODO check for special cases to ignore". That's just the point: you can't simply catch all "special cases" in just one method: you need a true recursive decent parser. How are you planning to catch the special cases when a closing bracket is inside a String:

?

You could answer: "well, I'll count the opening quotes to see if it's inside a String literal". Okay, and what about String like these then:

?

My point here is (towards the OP): there exists no simple method that magically does what you ask with arbitrary source code. If that's fine with you, then go right ahead with a regex-solution! But be aware of all the things that can go wrong, and don't be amazed when your application breaks all of a sudden.

Whatever you do, best of luck!

[ October 13, 2008: Message edited by: Piet Verdriet ]
 
Charles Lyons
Author
Posts: 836
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

That's just the point: you can't simply catch all "special cases" in just one method: you need a true recursive decent parser.

I don't agree with that at all, based on my own experience writing mathematical parsers which did a similar thing. There are only a few places where an opening or closing brace can legally occur and where it doesn't group code into a block (which is part of the counting mechanism). If you don't believe me, read the Java Language Specification. Yes, there are plenty of examples of invalid code which will mess up the conversion, but they will also fail to compile so we can ignore them.

Off the top of my head you've hit the two places of interest for the missing key bit of code: inside a comment (either type) and inside a string or character. If you can think of another case (or find one in JLS), don't hesitate to quote it. The first and last cases are dead easy to consider given JLS rules. The string is slightly more tricky, but only needs a simple parser which understands string and character delimiters and as you've demonstrated, the escape mechanism (which is itself easy if you read character-by-character as in my algorithm above, and ignore the case \" as a closing string delimiter). You really don't need a full-blown code syntax parser which is going to add huge overheads and extra libraries to what should be a simple job. If I put just half an hour to this, I think I could write a fully-compliant application (for the benefit of the OP, I won't). That's probably less time than it takes to become familiar with an external library. Better yet, this can easily be written in portable C and avoid the start up times of the JVM---then it'll be lightening fast on thousands of files.

After a little thought, there is one other case which can lead to the final code having a compile error: if the method already contains comments which end with */. In that case you can't just put a /* ... */ around the method. But this case needs to be considered as part of the routine above anyway, so it can be dealt with there: if we encounter a "*/" (which will be as part of a comment which we handle), replace it with something else (e.g. "*\/") or, for the courageous, convert that entire block to // quotes. Slightly trickier, but still far from impossible.

You saying one can't do it this way is just defeatist and may lead to markedly inefficient code---like many many things, you can do it with some careful forethought. From experience with Java one can cover all the sane cases. Analysing the JLS will cover all the cases (sane and no-so) guaranteeing the application won't "break of all a sudden". Of course, if you need to cover more situations than just this one, the benefits of a true syntax parser will outweigh the added complexity.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Charles Lyons:
...
You saying one can't do it this way
...



I didn't say that.
[ October 13, 2008: Message edited by: Piet Verdriet ]
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Piet Verdriet:
...
My point here is (towards the OP): there exists no simple method
...



... sure, one can have different opinions of what the a "simple" method is, but by looking at all the cases I'd have to take into account, it's isn't as trivial as I had the impression you lead the OP to believe.

All IMO, of course.
[ October 13, 2008: Message edited by: Piet Verdriet ]
 
Charles Lyons
Author
Posts: 836
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

it isn't as trivial as I had the impression you lead the OP to believe.

My apologies if that was the impression I conveyed. By "easy" I mean that it doesn't involve any complicated code or advanced libraries, and that it is a short program (I would hazard that the bulk of it can be accomplished in under 50 lines). The only techniques which need to be used are elementary string processing operations (namely charAt and indexOf). Naturally, as with all programs, you have to carefully think through the problem to come out with a decent and high-performance solution; isn't that part of the satisfaction that comes from writing a good program? Whether the solution is trivial depends on what background one comes from and how much experience they have (this is the intermediate, not beginner's, forum after all). Regardless, this can be done in a straightforward piece-by-piece way by thinking only about processing characters in a string linearly, starting with the simple program I gave and then building supplementary rules to cover the few special cases that exist. All good software is built in stages or modules, and this is no exception. I deliberately left the OP to think carefully about what would go in the // TODO bits of my code, since as you emphasised, those are the bits which require some intelligence!

It is all subjective though: one person's solution could easily be another's nightmare.
 
Piet Verdriet
Ranch Hand
Posts: 266
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Charles Lyons:
My apologies if that was the impression I conveyed. By "easy" I mean that it doesn't involve any complicated code or advanced libraries, and that it is a short program ...



No problem! Looking at the OP's attempts at solving this, I get the impression that the OP's notion of "easy" might differ "slightly" from yours!


I don't say this to put the OP down, of course!

Thanks for your elaborate clarification Charles.

Regards,
Piet.
[ October 13, 2008: Message edited by: Piet Verdriet ]
 
You got style baby! More than this tiny ad:
New web page for Paul's Rocket Mass Heaters movies
https://coderanch.com/t/785239/web-page-Paul-Rocket-Mass
reply
    Bookmark Topic Watch Topic
  • New Topic