December 9, 200718 yr Hello I am trying to separate some yellow page businesses from the internet into fields, Businessname, address, city, state, zip and Phone. the address of the sample addresses is http://www.yellowpagecity.com the problem is the addresses listed there are like this Canastota Concrete Co., Inc. MAP 306 E Walnut St Oneida NY 13421 315-363-4240 The word MAP has to be removed in every case and the big problem is sometimes it lists only the city and sometimes the full address, but there is nothing between the address and the city to differentiate the two. ( Where the address leaves off and the city starts) I could go by rightwords but sometimes cities are two words, like New Berlin . In the above address I have to remove the word map and then separate the address into the fields I have mentioned. How do I deal with the break up of the address when there is no comma between the city and the address?
December 9, 200718 yr Can't really be done by a computer. Although you could look for common markers, e.g. "St", "Dr", "Ave", "Blvd", etc., and/or the last number before the state, and/or the last single-character word, and perhaps a few others - it would be an awful lot of work, and ultimately it would have to be checked and corrected by a human.
Create an account or sign in to comment