• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Devaka Cooray
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Martijn Verburg
  • Frits Walraven
  • Himai Minh

Generating a String based on a Regular Exprssion

 
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I need to generate a String based on a given regular exression. Currently,
relying on a trivial approach to keep on generating a char sequence randomly
until it gets validated against the regular expression (using the
java.util.regex APIs).

Suggestions for any other optimized approach/ method/ APIs available which
can help ?

Regards,
Amit Srivastava.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, that's an unusual question. I think you'll probably have to parse the regex yourself to understand it in terms of its constituent parts, and develop a series of rules for generating random strings based on each type of feature. For example, for a character class like [xyz], you could count up how many characters are in the class, total, and generate a random number to choose one of the characters. Or for alternation (foo|bar), just assign a 50% chance to pick the thing on the left (foo), and a 50% chance to pick the one on the right (bar).

Be aware that there is no single unique way to do this. It may be tempting to say that all possibilities are "equally probable", but this idea becomes impossible when you get to something like (.*). That could be an infinite string. Or in Java, a string of length Integer.MAX_INTEGER (or however much you hae memory for). You probably don't want that though. For variable quantifiers like * or +, I would generate them one character at a time, and at each character, say there's, oh, a 10% chance that the * terminates right there. So the string will be as long as it takes before you happen to generate a random number < 0.10. It could be really long, but long strings are less likely than short ones. This is a good thing, probably.

There may be some libraries out there that would help you in parsing the regex. I don't just mean like Pattern.parse(), but something that generates a semantic structure for the regex. Perhaps a good idea would be to study the source code of Pattern.compile(), or of other regex parsers like OROMatcher or many others. You can figure this out yourself, but it would probably speed things up considerably to steal ideas from existing code. I doubt you'll find any libraries to generate random strings though - that part you'll almost certainly need to make for yourself.

Hope that helps...
[ May 17, 2006: Message edited by: Jim Yingst ]
 
Amit K Srivastava
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Jim,

Fine parsing could be achieved taking the concepts from existing regex parsers.
After parsing, one approach for generating String based on regex could be based on Finite State Automation

It takes:
1. Convert the regular expression's parsed components to the Finite State Machine (FSM)
2. Generating the sequence of characters in password while traversing the FSM until the final state is attained.


The major issue with the above approach is the complexity.


The determination of the total no. of sample space for the next state transition will become complex if the regular expression contains the sequence like pre defined character classes:

\d representing a digit: [0-9]
\D representing a non-digit: [^0-9]

Thanks,
Amit S
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you tell us *why* you want to do this?
 
Amit K Srivastava
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The intent is to generate a password based on a password policy (Policy may specify a regular expression).
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ah. You know, for something like that, I think allowing the full power of regular expressions would be overkill which makes your problem much harder than it needs to be. Why not just have a short list of control parameters, perhaps set in a properties file? E.g. allow the system admin to specify things like

length.min = 6
length.max = 10
require.one.digit = yes
require.mixed.case = no

I think something like this would be much easier to code...
[ May 17, 2006: Message edited by: Jim Yingst ]
 
Sheriff
Posts: 27451
88
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Amit K Srivastava:
...based on a password policy (Policy may specify a regular expression).

I'm sure that policy wasn't your idea. But it seems to me that if the general public was asked to use a password that satisfied a certain regular expression, complete confusion would ensue. So presumably ordinary people aren't being subjected to that policy?
 
Ilja Preuss
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What Jim said...
 
Amit K Srivastava
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well I agree with Jim using the regex with full power would be an overkill. So, allowing a subset of it in the regex will reduce the effort considerably.

Thanks Again.
--Amit S
 
When you have exhausted all possibilities, remember this: you haven't - Edison. Tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic