Win a copy of Head First Agile this week in the Agile forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Parsing a date from arbitrary text.  RSS feed

Basil Bourque
Posts: 7
Java Mac Postgres Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there any robust implementation of a parser for extracting a date from arbitrary text?
By arbitrary, I mean anything a user may type into a text field on an HTML form, for example.
The DateFormat class included with Java parses dates from text, but you must specify the _exact_format of the text. If I knew the exact format, and the user used it perfectly, I wouldn't need a parser library!
I'm looking for a more robust parser library that can make sense out of various input, expecially locale-aware (i18n) ones.
--Basil Bourque
Mark Vedder
Ranch Hand
Posts: 624
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"Holly un-validated input Batman!"
Parsing an arbitrary "anything goes" string into a Date object would be pretty hard if you were limiting it to just a single locale. Throw in I18N and that would be a pretty tall order indeed. Just think of all the possibilities:
  • order of month, day & year;
  • abbreviations vs. fully typed months and day names,
  • 4 digit year vs 2 digit year
  • 1 vs 2 digit month and date
  • different separators ('/' '-' '.' ' ')
  • does the string represent month & year; month day & year; month day, year, & time;
  • etc, etc

  • { if any math gurus want to calculated the total number of permutations, knock yourself out. Let s know the result }
    Take for example the string "200310" � is that
  • The month of October 2003
  • Oct 1, 2003
  • March 10, 2020
  • March 10, 0020
  • Oct 3, 2020
  • today at 8:03:10pm

  • Toss in the ongoing argument between my mother and I as to whether September is abbreviated "Sept" or "Sep" (let alone "Sept.") and the craziness just goes on an on.
    IMHO, you really need some kind of structure to the string in order to make parsing it a reasonably surmountable task. And if you are designing the input form, that gives you the opportunity to do so. It�s when you have raw collection of preexisting data and the data has no common structure or format that things can get very hard and tricky. And even then, it is usually a case of having multiple formats present, not just arbitrary data.
    Take a look at examples e320. Formatting a Date Using a Custom Format and e323. Formatting and Parsing a Date for a Locale for some guidance and sample code. Also look at the setLenient( ) method of the DateFormat class in combination with the above examples, although by default, date parsing is lenient. You may be pleasantly surprised how lenient the parser can be. It's not all-knowing, but it does a pretty good job.
    Personally, I try an avoid using an input field/parameter of "Date" on a user input form � I always use multiple input fields/parameters of Month, Day and Year (and hour & minute if needed) � And even then I often use drop down selection rather than text boxes. Using separate parameters makes validation a lot easier. It also allows for easier I18N in that I can change the order of the fields on the input form as needed for a particular locale. If you do use a single "date" input field, you simply must tell the user what format to use, and then validate that String before passing it on to your DateFormatter. (Who among us hasn't entered Feb 30th into a web form just to see if the programmer is using proper validation? And if I am the only one, then maybe I do need to get a life like my dog keeps telling me :roll: )
    Those are my thoughts on the subject. Others may have additional comments, or know of something that I am not aware of. I look forward to their opinions as well...
    Dirk Schreckmann
    Posts: 7023
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Welcome to JavaRanch, Basil!
    I'm moving this to the Other Java APIs forum...
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!