• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

US Address parsing?

 
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe off-topic:
Anyone here done something with US addresses? My head hurts reading USPS publications...
Basically, I need to parse an address from a String (TextField). Address is w/o ZIP code (although it would help), returns 4 values (st_num, st_prefix, st_name, st_suffix - don't ask, that's what i have to do).
Programming part is not a problem, I have a problem with different ways of typing the address...and different addresses in general.
So, if someone has already gotten through this pain, please give me a few hints (like where to find source code... j/k).
------------------

Darko
 
Sheriff
Posts: 3341
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I've gone through this and am actually surprised that there appears to be a need. There are actually more parts to a street address line,
House Number
Left Direction
Street name
Suffix
Right Direction
Unit/apt number
The easiest way to deal with this is to split the address into tokens using a StringTokenizer Object and checking each postion for tokens that meet criteria for each.
House Number: always in first position and is all numbers.
Left Direction always in the second position and is one of (N, N.,S, E, W, NW, ...)
Unit/Apt is the last position and the token begins with # or the token previous is the word Unit, Apt.
Right direction preceeds Unit/Apt and is one of the above in Left Direction.
Suffix preceeds Right Direction and is in the approveded list by USPS.
Finally Street Name is everything else.
Hopefully you are starting with a standardized address. If not you can hope for at most an 80% success rate. There are some software packages that do address normalization and the good ones are quite expensive. I can see why the one that I'm working with actually does fuzzy matching against a database of 90% of the US addresses and obtaining and maintain that database is exhaustive work.
Good luck!
 
Anonymous
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for your response.
So, I need some kind of address normalizer to start the thing?
Where did you get yours?
What do you think about USPS software (forgot the name)?
What about ZP4?
Those 4 elements were given, they said they can't take anything else ("Don't worry about apt #.."?!?!??!). Hm, pleasures of contracting from 2000 miles away, they never give you what you need.
Anyway, thx, it seems that I'll have to convince these people to spend some money (or I'll make my own thing, looks that way).
 
reply
    Bookmark Topic Watch Topic
  • New Topic