Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regex code to recognize fully qualified Java class names in strings?

 
Joe Vahabzadeh
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All,

Ok, I'm trying to write a bit of Java code to itself read .java files, and notify me when it finds a class name in quotes.

I've been struggling a bit with this, but am baffled.

I want to be able to see when I match something like:


where it picks up on the "com.myjob.MyClass"

Now there can be any arbitrary depth before the classname . . ie: it could be com.myjob.mywidgets.myspecializedwidgets.MyClass" or something as simple as "com.MyClass"

I've been playing around with the matches(regex) method in String, but I'm not getting quite the results I want.

Any pointers?

Thanks!
 
Rob Spoor
Sheriff
Pie
Posts: 20550
57
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How about you show us your current regular expression first, and we can tell you what's wrong with it. One hint - the package is zero or more occurrences of "something followed by a dot". I say zero because it's also possible to have a class in the default package.
 
Joe Vahabzadeh
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's the dot character that's throwing me I think, because it also means "any character" . . . in any case, I've tried the following:



Ok, so I know the + is wrong and should be a * because, as you said, it could be in the default package. I also know that all the packages follow the convention of all lowercase characters, and only the class names have mixed case.

So I already know that it should be something like:


And I tried it, but, while both of those regular expressions will capture "com.myjob.MyClass", they will also capture "commyjobMyClass"

I'm trying to say:
- any string of any number of characters (the rest of the line BEFORE what I'm looking for)
- followed by a double-quote character
- followed by zero or more occurrences of:
--- one or more lowercase letters followed by a period
- followed by an occurrence of:
--- one or more letters of any case.
- followed by a double-quote character
- followed by any string of any number of characters (the rest of the line AFTER what I'm looking for)

If I were to guess, I'd say my problem exists in this part: [[a-z]+\\.]* though I'm not sure how to fix it.
 
Matthew Brown
Bartender
Posts: 4567
8
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joe Vahabzadeh wrote:If I were to guess, I'd say my problem exists in this part: [[a-z]+\\.]* though I'm not sure how to fix it.

Do you mean ([a-z]+\\.)* ? I'm not quite sure what the effect of nesting [ ]s will be, but it's not what you're trying to do here.
 
Joe Vahabzadeh
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's exactly it - I was trying to nest something, but I am doing so incorrectly.

I didn't realize that parenthesis could be used like that to next something. Thanks, that seems to have solved this problem for me!
 
Winston Gutkowski
Bartender
Pie
Posts: 10427
63
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joe Vahabzadeh wrote:I didn't realize that parenthesis could be used like that to next something. Thanks, that seems to have solved this problem for me!

The real problem that you're likely to run into (as already stated by Rob) is that a class name doesn't necessarily have to have a dot in it. Also, there are several possibilities of strings that do contain dots that aren't class names. I suspect that you'll have to verify the result with something like Class.forName() if you want to be really sure.

Winston
 
Paul Clapham
Sheriff
Posts: 21137
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The idea that Java class names will be composed entirely of the letters "a" to "z" is pretty parochial and shouldn't be used for real-world applications, where pretty much any Unicode letters can be used. (Check the Java Language Spec for the exact rules.) It's probably okay for personal use, though. Although you might want to consider the possibility that class names can also include digits -- unless you already did and I didn't notice where in the regex you did that.
 
Joe Vahabzadeh
Ranch Hand
Posts: 140
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston and Paul,

You are both correct.

Winston - I will be using Class.forName() - but I'm filtering the strings first to make sure I have a String that is in quotes first. If it passes the regex, then Class.forName() will be used to verify.

I've also switched back to assuming at least one dot - as for the particular source code that I need to run this program against, NONE of the classes are in the default package.

Paul - agreed, but for the source code that I'm running this program against, it's a known quantity that the package names consist of only lowercase letters, and the class names only consist of upper and lower case letters, and numbers (I've modified the regex slightly to reflect that).


Ultimately, though, my biggest stumbling block was the nested brackets, when I should've been using parenthesis.
 
Winston Gutkowski
Bartender
Pie
Posts: 10427
63
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joe Vahabzadeh wrote:Winston - I will be using Class.forName()
Paul - agreed, but for the source code that I'm running this program against, it's a known quantity that the package names consist of only lowercase letters, and the class names only consist of upper and lower case letters, and numbers (I've modified the regex slightly to reflect that)...
There are a few other rules that you can apply too. For example: class names cannot start with a number (although, as Paul said, they can include numbers). You can find the exact rules in the JLS, and I would make sure your regex covers them all, because you don't want to be calling something as heavyweight as Class.forName() on a string that can't possibly be a class name.

Ultimately, though, my biggest stumbling block was the nested brackets, when I should've been using parenthesis.
Welcome to the world of parsing; something that regex is definitely NOT good for. And don't forget about escaped/doubled quotes either.

Winston
 
Paul Clapham
Sheriff
Posts: 21137
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know... my programs are full of string literals like "a" or "Error". These could of course be class names, but they aren't. Even if you get this regex working, you're going to have more false positives than true positives.

Edit: I see you're now requiring a package name. That's likely to reduce the false-positive level considerably from what I originally assumed.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic