• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Regex help.

 
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I need to parse a String that can be formatted in a variety of ways. The String is a feet-inch-fraction format. Any of the following values are valid:

I can grind it out with several different patterns but it seems there should be a better way.
Thanks in advance,
Michael Morris
 
"The Hood"
Posts: 8521
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Gee - aren't WE a bunch of help .
It looks like it will be messy no matter how you handle it.
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for noticing at least Cindy I'm just grinding it out. The code is nasty-looking, but so far seems to be working. It's difficult to set up a unit test on it and be totally satisifed that you've considered all the combinations.
Michael Morris
 
Sheriff
Posts: 7023
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just in case you didn't already know about it, you might have some success browsing through http://www.regexlib.com/
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Dirk. That looks like a pretty neat site. I've been away from Unix too long. I used to could grep with the best of 'em. I'm having to recall most of the regex stuff.
Michael Morris
 
Ranch Hand
Posts: 336
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The String is a feet-inch-fraction format.

Not exectly sure what that means . Also, is this to merely validate your string or would you like to glean out the values?
Any of the following values are valid:
Mike, can you plesae list some strings which would be invalid?
Cheers,
Leslie
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Leslie,
I knew I could count on a PERL evangelist to come thru on this .
We have an app that generates steel fabrication drawings. When we first started, some 15 years ago, everything was input by hand, now almost everything is read in from a neutral file generated by a structural modeling program. The format, feet-inch-fraction format is actually feet, inch and sixteenth inch. 95% percent of the time it would be like: 23'-2 9/16 or 23 feet, 2 and nine-sixteenth inches. But it can also be floating point or integral feet like:1.8125 or even whole, integral or mixed number inches like: 3.0625". Note that in the first example above that either 23'2 9/16 or 23-2 9/16 would also be legal and an optional inch tick could follow: 23'-2 9/16". Illegal values would be any string which contained any non digit, single quote, dash, period, double quote or forward slash. No white space is allowed except at least one space must occur between the inch integer and the inch fraction (exterior white space is OK which can be trim()ed). Obviosly any out of sequence separators would be illegal or a missing feet value followed by the feet tick like: '-2 3/4.
If you need more info let me know.
Michael Morris
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


Also, is this to merely validate your string or would you like to glean out the values?


The plan is to either throw a NumberFormatException on an invalid string or return a double from a static method.
Michael Morris
 
author
Posts: 3252
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What about
^\d+'?[ -]?(?:\d*(?: ?\d+/\d+)?"?)?$|^\d*.\d+['"]?$
I'm not in a position to test this though. It consists of two alternative regexps. The first is the fractional syntax
^\d+'? feet, optionally followed by '
[ -]? optional separator
(?:\d* whole number of inches, inside an optional non-capturing group
(?: ?\d+/\d+)? fractional number of inches, inside another optional group
"?)?$ the optional closing "
The second uses decimal syntax
^\d* integral part, zero or more digits
.\d+ fractional part, one or more digits
['"]?$ optional feet or inch marker
Probably not 100% there, but it seems comes some way. Not tested, as I said, so no guarantees whatsoever.
- Peter
 
Michael Morris
Ranch Hand
Posts: 3451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the regex Peter. It seems to work on most everthing but gives some false positives. I think I going to go with a divide and conquer strategy. I am going to look at splitting using "['\-]". That should give me the feet side and inch side in separate strings or just the whole string. Then the regexes become much easier to work with.

Thanks everyone,
Michael Morris
[ March 25, 2003: Message edited by: Michael Morris ]
 
Leslie Chaim
Ranch Hand
Posts: 336
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Michael,
I�m sorry I did not get back sooner, I just got a bit overworked at work Furthermore, when I give a solution I always get carried away with my palaver and all that takes time. Nevertheless, I feel somewhat obligated to post a decent reply after you went through all the trouble explaining your problem.
I understand that you are opting (or thinking) about the divide-and-conquer approach but for me it�s about the challenge and passion of regex, not necessarily about Perl . (BTW it�s Perl not PERL ). Your problem is interesting and the challenge is worth it!
I knew I could count on a PERL evangelist to come thru on this .
Well thanks Mike actually, and for the record, my Perl obsessions root back to the MRE book, and it�s all Jeff�s fault
OK, on to the matter:
After (I hope) carefully reading your description, it seems to me that you need a pattern as follows:
  • NUMBER (integral or floating) followed by
  • Feet or inch symbol (?) FB
  • Dash (?) FB
  • Integer (?) FB
  • One space and the fraction (?) FB
  • Inch tick (?)


  • Yes, 95% you will have all the data but except for the first NUMBER everything else must be optional for obvious reasons. Notice the �Space and the fraction� as one unit. It�s always good to be as specific as possible to the regex engine this help avoid false-positives (whatever they're suppose to mean ).
    I purposely did not give the regex chars first to give you (and myself) a starting place how to construct the pattern. If you find any problem with the actual regex I proposed, we should first examine if we studied the sought pattern.
    Here's the regex:
    ^(\d*\.?\d*[1-9]+\d*|[1-9]+\d*\.\d*)['"]?-?\d?( \d{1,2}/\d{1,2})?"?$
    And the breakdown, they correspond one-to-one with the previous list:
  • (\d*\.?\d*[1-9]+\d*|[1-9]+\d*\.\d*)
  • ['"]?
  • -?
  • \d?
  • ( \d{1,2}/\d{1,2})?
  • "?


  • The entire expression is wrapped around '^' and '$'. The NUMBER is restricted that it does not allow zero as a value. The fraction is also restricted it only allows two numbers for the numerator and the denominator. Of course, '\d' is really [0-9] and some times you may want to be more specific in the regex, but regex is not for everything and we can use other tools too.
    And yes I asked Perl for some help:

    Here's what I've tested:


    $ perl checkit
    Data: .25
    It's a match!
    Data: 23'-2 9/16
    It's a match!
    Data: '-2 3/4
    No Match
    Data: .25'
    It's a match!
    Data: .25"
    It's a match!
    Data: 10
    It's a match!
    Data: 0
    No Match

    Data: 2'
    It's a match!
    Data: 3"
    It's a match!
    Data: 1.37'
    It's a match!
    Data: 1'2
    It's a match!
    Data: 1'-2
    It's a match!
    Data: 1-2 3/4
    It's a match!
    Data: 3'- 13/16
    It's a match!
    Data: 3' 2
    No Match
    Data: 3' 2"
    No Match

    Data: 2'3"
    It's a match!
    Data: 3'- 20/100
    No Match
    Data: <CTRL D>
    $


    I hope that this is good enough for you. The next thing you may want is to pluck out the values, well for that you'd put some capturing perens and obtain the values from Matcher.group() method. That, I think, is simple.
    With pleasure,
    Leslie
    [ March 26, 2003: Message edited by: Leslie Chaim ]
     
    Michael Morris
    Ranch Hand
    Posts: 3451
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hi Leslie,
    Thanks for the effort. It certainly appears to cover all the bases.


    (BTW it�s Perl not PERL ).


    Duly noted. I know, that I still get upset when I see someone writing or saying X Windows when everyone should know that it is the X Window system. Pluralizing it is just wrong.
    Anyway, if you ever find yourself in the Longview/Tyler, Texas area, look me up and I'll treat you to a longneck and a rare steak.
    Michael Morris
     
    And inside of my fortune cookie was this tiny ad:
    a bit of art, as a gift, that will fit in a stocking
    https://gardener-gift.com
    reply
      Bookmark Topic Watch Topic
    • New Topic