hartmut Posted December 9, 2007 Posted December 9, 2007 Hello I am trying to separate some yellow page businesses from the internet into fields, Businessname, address, city, state, zip and Phone. the address of the sample addresses is http://www.yellowpagecity.com the problem is the addresses listed there are like this Canastota Concrete Co., Inc. MAP 306 E Walnut St Oneida NY 13421 315-363-4240 The word MAP has to be removed in every case and the big problem is sometimes it lists only the city and sometimes the full address, but there is nothing between the address and the city to differentiate the two. ( Where the address leaves off and the city starts) I could go by rightwords but sometimes cities are two words, like New Berlin . In the above address I have to remove the word map and then separate the address into the fields I have mentioned. How do I deal with the break up of the address when there is no comma between the city and the address?
comment Posted December 9, 2007 Posted December 9, 2007 Can't really be done by a computer. Although you could look for common markers, e.g. "St", "Dr", "Ave", "Blvd", etc., and/or the last number before the state, and/or the last single-character word, and perhaps a few others - it would be an awful lot of work, and ultimately it would have to be checked and corrected by a human.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now