• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Devaka Cooray
  • Ron McLeod
  • paul wheaton
Saloon Keepers:
  • Tim Moores
  • Piet Souris
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Frits Walraven
  • Scott Selikoff

Regex question

 
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have been looking into the java.util.regex package. What I need to be able to do is test if a particular regex A is a subset of a regex B. That is, test whether it is true that any String matching A also matches B. Does anyone have an idea how I might do this? Of course it is not simply a case of testing whether the String representation of A matches with B, for example the String "[ab]d" clearly does not match the regex "[abc]d", however the regex "[ab]d" is clearly a subset of "[abc]d".

Many thanks,

Paul
 
author & internet detective
Posts: 41405
854
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Paul,
Welcome to JavaRanch!

I'm going to move this to our intermediate forum as it's not an easy thing to do (if it is possible at all.)
 
Jeanne Boyarsky
author & internet detective
Posts: 41405
854
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Paul,
Is this the original requirement? In other words, did someone tell you to compare arbitrary regular expressions. If not, can you post some more detail. There may be a better answer to a more specific question. Even if it just that you can only receive a limited subset of regular expressions.

f you are stuck comparing arbitrary regular expressions, it is hard. I know you are new to JavaRanch, but not if you are new to programming and computation theory. If so, this probably won't make sense.

I looked a little online (because I was curious what the answer was.) All of the discussions I can find on the topic involve state machines (which brings back memories from ny theory of computation class) and reg exp parsers. The best discussion I found was posted here.

If this hasn't scared you off yet, look at the post in that thread posted on "Jan 19, 2005 at 22:56". They give an "alogorithm" for it. I use quotes because it is really high level and hard to do.
 
Jeanne Boyarsky
author & internet detective
Posts: 41405
854
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
After writing that, I think this is actually an advanced question! Moving again.
 
PaulN Pearson
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for your help. I am fairly new to Java, having studied it for less than a year, and to computation theory in general, coming from a mathematics background and recently having changed career to a development job. Hence the forum you pointed me towards was a bit on the technical side, and probably unnecessarily detailed for what I want to do.

The problem I was given at work was to implement a method which takes two arguments; a LinkedHashMap (which we'll call map) keyed by regex Strings mapped to an "id", and a String (str), and to return the id obtained from the "best match" key for str. The idea is that str represents a file path, and the regexes represent sets of paths which will each have a different id.

My first attempt at an implementation returns the id obtained from the first exact match. We would like to return the id associated with the most specific path. For example, our map may countain the keys "dir1/dir2/.*" and "dir1/.*". When matched against str = "dir1/dir2/file.ext" we would then use the first regex. Since it uses the first match, my implementation of course relies on the keys having been added to the map in order, from most specific to least. I would like my method to be able to reorder the regexes in case they have not already been added to the map in the correct order. The problem is that there may be, for example, different file extensions which str may have. The map may contain the following keys: "dir1/dir2/.*/file\\.ext1", "dir1/.*/file\\.ext1", "dir1/dir2/.*/file\\.ext2" and "dir1/.*/file\\.ext2".
We would then require that str = "dir1/dir2/dir3/file.ext1" used the first of these regexes, while str = "dir1/file.ext2" used the last. As in this example, the regexes may not possess a natural total order. I was hoping to reorder the keys into totally ordered subsets, each subset running from most specific to least, which is satisfied by the order of the four regexes given above. This wouldn't have been hard if there was a quick and easy way of checking whether regex A was a subset of regex B, but since this is not the case, I will need to find a solution which works for our needs.

I do not have full knowledge of the type of regexes we will be using, and I don't think this has been fully decided yet, so arriving at a good enough implementation may not be possible just yet.
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, in your specific example, I noticed that the less specific regex actually would match against the more specific regex. Of course it's not hard to construct regexes for which this doesn't work, but perhaps it helps you...
 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Another thing I noticed is that in your example, the more specific regex is longer than the less specific. Of course this also would be just a very rough heuristic...
 
All of the world's problems can be solved in a garden - Geoff Lawton. Tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic