• Post Reply Bookmark Topic Watch Topic
  • New Topic

Generating a String based on a Regular Exprssion  RSS feed

 
Amit K Srivastava
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I need to generate a String based on a given regular exression. Currently,
relying on a trivial approach to keep on generating a char sequence randomly
until it gets validated against the regular expression (using the
java.util.regex APIs).

Suggestions for any other optimized approach/ method/ APIs available which
can help ?

Regards,
Amit Srivastava.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, that's an unusual question. I think you'll probably have to parse the regex yourself to understand it in terms of its constituent parts, and develop a series of rules for generating random strings based on each type of feature. For example, for a character class like [xyz], you could count up how many characters are in the class, total, and generate a random number to choose one of the characters. Or for alternation (foo|bar), just assign a 50% chance to pick the thing on the left (foo), and a 50% chance to pick the one on the right (bar).

Be aware that there is no single unique way to do this. It may be tempting to say that all possibilities are "equally probable", but this idea becomes impossible when you get to something like (.*). That could be an infinite string. Or in Java, a string of length Integer.MAX_INTEGER (or however much you hae memory for). You probably don't want that though. For variable quantifiers like * or +, I would generate them one character at a time, and at each character, say there's, oh, a 10% chance that the * terminates right there. So the string will be as long as it takes before you happen to generate a random number < 0.10. It could be really long, but long strings are less likely than short ones. This is a good thing, probably.

There may be some libraries out there that would help you in parsing the regex. I don't just mean like Pattern.parse(), but something that generates a semantic structure for the regex. Perhaps a good idea would be to study the source code of Pattern.compile(), or of other regex parsers like OROMatcher or many others. You can figure this out yourself, but it would probably speed things up considerably to steal ideas from existing code. I doubt you'll find any libraries to generate random strings though - that part you'll almost certainly need to make for yourself.

Hope that helps...
[ May 17, 2006: Message edited by: Jim Yingst ]
 
Amit K Srivastava
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Jim,

Fine parsing could be achieved taking the concepts from existing regex parsers.
After parsing, one approach for generating String based on regex could be based on Finite State Automation

It takes:
1. Convert the regular expression's parsed components to the Finite State Machine (FSM)
2. Generating the sequence of characters in password while traversing the FSM until the final state is attained.


The major issue with the above approach is the complexity.


The determination of the total no. of sample space for the next state transition will become complex if the regular expression contains the sequence like pre defined character classes:

\d representing a digit: [0-9]
\D representing a non-digit: [^0-9]

Thanks,
Amit S
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you tell us *why* you want to do this?
 
Amit K Srivastava
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The intent is to generate a password based on a password policy (Policy may specify a regular expression).
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah. You know, for something like that, I think allowing the full power of regular expressions would be overkill which makes your problem much harder than it needs to be. Why not just have a short list of control parameters, perhaps set in a properties file? E.g. allow the system admin to specify things like

length.min = 6
length.max = 10
require.one.digit = yes
require.mixed.case = no

I think something like this would be much easier to code...
[ May 17, 2006: Message edited by: Jim Yingst ]
 
Paul Clapham
Sheriff
Posts: 22713
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Amit K Srivastava:
...based on a password policy (Policy may specify a regular expression).
I'm sure that policy wasn't your idea. But it seems to me that if the general public was asked to use a password that satisfied a certain regular expression, complete confusion would ensue. So presumably ordinary people aren't being subjected to that policy?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What Jim said...
 
Amit K Srivastava
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well I agree with Jim using the regex with full power would be an overkill. So, allowing a subset of it in the regex will reduce the effort considerably.

Thanks Again.
--Amit S
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!