Jump to content

identifying households in data (complex find?)


This topic is 7310 days old. Please don't post here. Open a new topic instead.

Recommended Posts

  • Newbies

Hello Folks--

I have a (very simple) database of voter information (first & last names, street number, street, party, etc).

I need to identify households with members of the same party.

That is, street name matches, street number matches, and party matches.

I want to return all such households:

  • Smith John 2 Main Street Springfield Democrat
    Smith-Jones Mary 2 Main Street Springfield Democrat
    Flintstone Wilma 100 Elm Springfield City Unenrolled
    Flintstone Fred 100 Elm Street Springfield Unenrolled

These do not match, and should not be returned:

  • Rubble Betty 102 Elm Street Springfield Democrat
    Rubble Barney 102 Elm Street Springfield Republican
    [different party]

    Simpson Homer 742 Evergreen Terrace Springfield Democrat
    Simpson OJ 742 Evergreen Terrace Capitol City Democrat
    [different city]

I'm at a loss for how to even start this, nevermind how to implement it. I'm thinking some kind of stepping through street names, then numbers, then party, but....

I looked through the posts in this forum and couldn't find (!) anything directly relevant. I did find lots of other interesting and useful stuff.

Thanks for any insight or suggestions.

-- Lane

FileMaker Version: 6

Platform: Windows XP

Link to comment
Share on other sites

Lane,

I've been looking for a solution for your question, but I have a few question:

LBourn said:

(first & last names, street number, street, party, etc)

Is there more to the list than the example gives? Without knowing what the "etc." is, it is more difficult to come up with a working solution. Thanks!

Paul

Link to comment
Share on other sites

  • Newbies

Hi Paul--

I've listed all fields in the database below, but the only ones I care about (because they're needed for the searching) are street number, street, city, and party, plus names. Those are the ones that determine whether someone is in a household, and the party info determines if they're the same party (once a household is detected).

Fields include:

  • VoterID (UNID)
    FirstName
    LastName
    MiddleName
    Suffix
    StreetNumber
    StreetNumberLetter (like the 'a' in 54a Main St)
    StreetName
    AptNumber
    City
    ZIP
    [same as all address info above repeated for Mailing address]
    Party
    Gender
    DateOfBirth
    DateOfRegistration
    WardNumber
    PrecinctNumber
    CongressionalDistrictNumber
    StateSenateDistrictNumber
    StateRepDistrictNumber
    VoterStatus

There are about 4000 records.

Does this help?

-- Lane

Damocles said:

Is there more to the list than the example gives? Without knowing what the "etc." is, it is more difficult to come up with a working solution. Thanks!

FileMaker Version: 6

Platform: Windows XP

Link to comment
Share on other sites

Okay, I think I misunderstood the question. crazy.gif Do you already have the information in separate fields, and you need to sort out the records that have the same household and party affiliation?

OR

Is the data all in one field, and you need to parse it out into separate fields THEN sort for households/party? I'd send yo my "solution" so far, but I probably solved the wrong problem, which is just a polite way of saying I've created something cool, but useless. smile.gif

Let me know, and if nobody has it solved by tomorrow, I'll take a new look at the issue.

Link to comment
Share on other sites

  • Newbies

The first choice you gave:

All the data is in separate fields, and needs to be sorted into households with the same party affiliation.

I don't know which situation you solved; I hope it's this one. smile.gif

"Cool and useless" has it's place too.

Link to comment
Share on other sites

Good Morning.

I'm going to say that the biggest problem in doing this is that the data entries are not consistent. From your earlier example:

Flintstone Wilma 100 Elm Springfield City Unenrolled

Flintstone Fred 100 Elm Street Springfield Unenrolled

If these are supposed to show the same household, I'm guessing that someone (probably you) will have to go through and standardize the addresses, so my "solutions" will probably aim that direction. I'll use the sample data and get back to you soon. For now, I'm attempting to create an identifier that is unique enough to find people who are probably in the same house, without ruling out matches that are not exact.

This will allow someone (you, presumably) to sort the records so that if you do have to manually review the addresses, they're at least in an easy order to review.

Paul

Link to comment
Share on other sites

  • Newbies

No no -- the data IS standardized and the street names are consistently consistent. Except when I type an example. frown.gif

The example should have been Elm Street for any and all people living on Elm Street. This is voter ID data which is highly standardized by the State (of Massachusetts). I have only found one inconsistency regarding street addresses. Granted, it was a very quick check, but it's indicative of a small (enough for me) error rate.

However, if there is a way to be "fuzzy" about it, that'd be a better solution, since this data WILL need to be viewed by a human to confirm the results (households with same party issues).

Additional issue that might mess things up:

Sometimes there is more than just two people living at an address (parents with kids).

This another reason why hand-confirmation will be required. If the householding by address works, and the party can be checked for (at least 2 of same party in a househould), then the cases where there are kids at home can be plucked out by hand easily enough.

I hope the kids issue doesn't screw things up. I spaced it in one of the examples, and frankly, I can't believe I messed up the Elm / Elm Street difference. And I worked really hard on that dang list of examples! :-)

-- Lane

Link to comment
Share on other sites

Lane,

I think this is what you want. It sorts voters by City, Household (an aggregation of StreetNumber and Street), Party, and Age(DateofBirth). I created a field called CountRelated at the bottom of Layout #1 that shows how many people in the database have the same city, address, and party affiliation. If you want to find out how many households in your database have 2 or more people with the same party affiliation, perform a Find with ">1" in the "CountRelated" field.

The only script for this database will do the find automatically. It's set up to find all the households with >1 residents who have the same party affiliation. The results are broken down by age. Anyone who is old enough to have kids that are old enough to vote is assumed to be a parent. It's a quaint convention, but one that could just as easily be labelled as "Generation 1 or Generation 2" rather than parents/children. To change the labels, look in the field definition calculation for GenerationIndicator.

There's a self-join relationship based on the RelateToSelf field (Calculated to be Household & Party & City & GenerationIndicator) That way, any people who have the same household, party, cityand generation indicator are grouped together. It then counts the records that are similar.

I'm having a horrible time explaining this. Just play with the file and let me know if you have questions. I'm hoping that by posting this, we'll get feedback from someone with a little more knowledge (who can explain the solution better, or provide a better solution)

Hope this works. Let me know if anything either doesn't work or is just plain wrong, or if there is more you might like to see.

Paul

Voter Sorting.zip

Link to comment
Share on other sites

  • Newbies

Woo-hoo!

This is great -- I tried the sample you offered and it fits the bill. But more importantly, I got it to work on my data!

I think it would be a pain and a half for a dataset larger than mine (4000 records) since there's so much hand sorting, but it worked great for me. It's also a good jumping-off point for tweaking.

I'll post my tweaks in follow-ups.

Thanks Paul!

-- Lane

PS

BTW, you were correct about your "horrible time explaining this," but I trudged through and got it. smile.gif The trickiest part *I* think was that the data was sorted in the sample and I didn't have the same sort in mine (I *think* that was an issue, although I'm not certain). That and the fact that you can't copy an entire layout from one DB to another. You're good!

FileMaker Version: 6

Platform: Windows XP

Link to comment
Share on other sites

Here are screenshot pictures that are worth more than any explanation I might have. The script was designed so you should not have to do too much (any) sorting by hand. Make sure that the last "Sort" you do before editing the script looks like the picture attached. That way, the script will always default to the earlier sort order, which should preclude any need for manual sorting. smile.gif

Let me know if you have any other "challenges" that I might smirk.gif be able to help with...

Paul

Pictures not words.zip

Link to comment
Share on other sites

This topic is 7310 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.