January 10, 20206 yr Using: FMPA 16 Good evening - I have a text field that's auto-populated from an xls file. It's up to six lines long, and contains various data per record that looks like any of the following: (SAMPLE 1) ANYUSER RICK & MELANIE 123 ANYWHERE ST LAS VEGAS NV 89108 (SAMPLE 2) ANYUSER RICK ANYUSER MELANIE & JANINE 123 ANYWHERE ST LAS VEGAS NV 89108 (SAMPLE 3) ANYUSER RICK 50% ANYUSER MELANIE 50% C/O THE RICK AND MEL TRUST 123 ANYWHERE ST LAS VEGAS NV 89108 (SAMPLE 4) THE ANYUSER COMPANY 123 ANYWHERE ST COLDSTREAM BC V1B 3W8 CANADA What I need to accomplish: I need to non-manually populate this data into separate fields. Those fields are OWNER1, OWNER2, ADDY1, ADDY2, CITY, ST, ZIP, COUNTRY. Unfortunately, I cannot alter the data any further before it gets to my database. For the life of me, I cannot figure out the best way to make this happen. Does anyone have any suggestions for me? Thank you in advance!
January 10, 20206 yr I don't think it's possible to automate this process without introducing some additional rules that are not apparent from the given examples. I notice that the last line can be either the country or the (US?) state and ZIP code. We need a way to determine which one it is - perhaps we can assume that if the line contains any digits, it is the latter? Next there is the problem of deciding whether the 2nd line is owner #2 or the street address (this is assuming that the number of owners is always either 1 or 2). I don't think a computer is able to look at "ANYUSER MELANIE & JANINE" or "C/O THE RICK AND MEL TRUST" and make this decision on its own, without some helper rule. Finally, the line "COLDSTREAM BC V1B 3W8" is going to be very difficult to parse into city (which can be more than one word), state and ZIP without some additional rules to determine where the city name ends and where the ZIP code starts. Edited January 10, 20206 yr by comment
January 10, 20206 yr Author A couple things that may help re - your response: 1) The Country CANADA is usually (lets call it 98% of the time another country appears) the only other variable country besides the US. 2) Sample 1 is the most common occurrence - probably occurs 75% of the time. 3) NV is the most common state, by about 95%. Maybe we could work our way up from the bottom somehow by detecting and parsing the state and zip code first, then move up to city, then street? Also, is there a way to detect the numeric portion of the street to differentiate it from the name(s) above it?
January 11, 20206 yr Under certain assumptions (which I hope will be self-evident), you could something like: Check if the last line matches the pattern of "@@ #####"; if yes, set the Country field to USA (or leave it empty?), the State field to the first word of the last line, the ZIP field to the second word of the last line, and the City to the entire line before last . Otherwise , set the Country field to the entire last line. If the last line is "Canada", parse the line before last as follows: the last 2 words (or 7 characters) go to the ZIP field, the 3rd word from right is the State, and the rest is the City. If the last line is something else, then we don't know how to handle the line before last. Now we need to look at the remaining lines: if there are 2 remaining lines, then the first of them is Owner1 and the other one is Address 1. If there are 4 of them, then the first 2 are owners and the other two are the address. If there are 3 remaining lines, then check if the 2nd line begins with a digit character (or with a word that consist of digits only?). If yes, then there is one owner and two address lines, otherwise it's two owners and one address line. The problem with this approach is, of course, that if will fail colossally when any of these assumptions is not true, as well as the missing instructions for countries other than USA and Canada. So at best it can serve as a first draft for human review. Edited January 11, 20206 yr by comment
Create an account or sign in to comment