• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Knute Snortum
  • Bear Bibeault
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Frits Walraven
  • Carey Brown
  • Tim Holloway

Regular expression to find comments

 
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could you please say how to create a regex for the following task:

"Write a program that reads a Java source-code file (you provide the file name on the command line) and displays all the comments."

For now I have created this one: regex = "^[^\"]*(//.*)";

But:
1) it does not find comments like these /*....*/
2) it does not find comments that are located after " signs, like this: String s = "this is me"; //Comment
 
lowercase baba
Posts: 12751
51
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
regular expressions are not easy things to begin with. They are even more difficult when you are trying to search for something that exists across multiple lines.

Have you considered that they may NOT be the way to solve this particular problem?
 
Sheriff
Posts: 13517
223
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Salimych wrote:Could you please say how to create a regex for the following task:


Well, that would amount to us doing your homework for you, which we don't do around here.

Here's something to consider though:

You probably want to use more than one regex because there are quite a few ways comments can be written:


Of course, you may want to start with just a few simple cases first and then work your way up to handling the more complicated ones later.

EDIT: And then there's the excellent point that Fred makes: Do you really have to / should you use regex for this?
 
Sheriff
Posts: 6042
157
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

it does not find comments like these /*....*/


I believe this is one of the situations where a regex cannot be found to do the job.

it does not find comments that are located after " signs, like this: String s = "this is me"; //Comment


I would go at it like this: find a regex that matches the comment at the beginning of the line (you've done this), find a regex that matches at the end of a line, then combine them with an alternation character (|).
 
Saloon Keeper
Posts: 10310
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For single line comments, you need to match "//" and everything that follows it, until the next end-of-line.

For multi-line comments, you need to match "/*" and everything that follows it, until the next "*/".

You can combine the two patterns into a single find operation.

Despite the general overuse of regular expressions, I think this is a prime example of a task for which regular expressions are perfect.
 
Knute Snortum
Sheriff
Posts: 6042
157
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

For single line comments, you need to match "//" and everything that follows it, until the next end-of-line.


Hmm, what about this line?
 
Bartender
Posts: 10759
68
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Salimych wrote:Could you please say how to create a regex for the following task:
"Write a program that reads a Java source-code file (you provide the file name on the command line) and displays all the comments."


Well, the very first thing I'd do - before I write my first line of code or start worrying about regular expressions - is find out exactly what a "comment" in a Java source file is.

it does not find comments like these /*....*/


And this is just the type of thing I'm talking about. Can you describe the "rules" of a /*....*/ comment?
One that is plainly causing you a problem is that they are generally used for "multi-line" comments, but what about:
  /*.... /* .... */ .... */
or:
  /*.... /* .... */
or:
  // .... /* ....
  */
Are they valid "comments"?

Programming is NOT about coding; it is about thinking...and it often involves research too.
I don't know the answers to my question, but you can be darn sure I'd find out if I had to write this.

It's probably also worth mentioning that regexes are not good for some types of pattern-matching, and if '/*....*/' comments can be "nested", you'll have just run into one of them.

HIH

Winston
 
Stephan van Hulst
Saloon Keeper
Posts: 10310
217
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Knute Snortum wrote:Hmm, what about this line?


What about it? I can easily write a regex that will find the comment within that line.
 
Knute Snortum
Sheriff
Posts: 6042
157
Eclipse IDE Postgres Database VI Editor Chrome Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, my point is that there is no comment in that line of code. The quote escapes the //. Or am I misunderstanding you?
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:

Knute Snortum wrote:Hmm, what about this line?


What about it? I can easily write a regex that will find the comment within that line.



The thing is here is no comment, this is just a single string. But my aforementioned regex works fine with this situation
 
Stephan van Hulst
Saloon Keeper
Posts: 10310
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ahh I'm sorry, I thought you were just talking about the String literal, I didn't even consider the entire statement XD

Well, the same is true for comments within comments. You can solve this problem by including an alternative in your regex that will match on character and string literals, and then the find algorithm will just match on whatever it finds first. All you have to do then is discard matches that start with a quote or a double quote.
 
Marshal
Posts: 24594
55
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


Don't forget about the possibility of Unicode escapes in the source code.
 
Winston Gutkowski
Bartender
Posts: 10759
68
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Paul Clapham wrote:Don't forget about the possibility of Unicode escapes in the source code.


Good point mah man. Have a cow.

@Ruslan: See? More "rules" you need to know about if you want an industrial-strength solution.

Winston
 
Ruslan Salimovich
Greenhorn
Posts: 25
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Paul Clapham wrote:

Don't forget about the possibility of Unicode escapes in the source code.



Thank you!
So I see that first I have to define what is comment, and only then start solving a problem, since I see there are a lot of variants of comments.
 
Winston Gutkowski
Bartender
Posts: 10759
68
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:What about it? I can easily write a regex that will find the comment within that line.


Hmmm, really? And what about
  System.out.println("Use \"//\" to comment your code");
or indeed
  System.out.println("Use ""//"" to comment your code");
? (I forget if Java allows the last one, but several languages do)

The fact is that these are semantic rules, and regexes are NOT (generally) very good for dealing with them.

Winston
 
Winston Gutkowski
Bartender
Posts: 10759
68
Hibernate Eclipse IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ruslan Salimych wrote:So I see that first I have to define what is comment, and only then start solving a problem, since I see there are a lot of variants of comments.


Not just variants. Knute's post shows that there is also "context" to this problem - and that's where regexes run into problems.

1. A "start of comment" is ONLY the start of a comment if it is not in quotes.
2. Conversely, "quotes" (for the purposes of this problem) only exist outside a comment.

And let me remind you of the requirements:
"Write a program that reads a Java source-code file (you provide the file name on the command line) and displays all the comments."

No mention of regexes. You have assumed that they are the road to Nirvana - and I suspect you're wrong.

Winston
 
Stephan van Hulst
Saloon Keeper
Posts: 10310
217
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:Hmmm, really? And what about [...]


Okay, I take back "easy". However, I would still use regular expressions in parts of my solution.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!