Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×

This topic is 7478 days old. Please don't post here. Open a new topic instead.

Recommended Posts

  • Newbies
Posted

Here's problem that has me stumped. I have a giant database of people, with addresses, genders, etc. I'd like to get this to the point where I have one record for each unique household where each household record includes the number of people in the household by gender, i.e. MM or MF or FF, etc. Anyone have thoughts one how to efficiently do this?

Posted

Sounds reasonable.

If you would provide some information about your current setup (your files and the relationships between them), you could get some advice on how to get there from where you are now.

  • Newbies
Posted

It's simply a flat file, albeit one with a very large number of records. The records are all the people in a given area, with each person's address, gender, and other extraneous fields. Gender is either M, F or blank. I'd like to create a record for each unique household that tells me how many men or women live at that address.

Posted

Oh, okay. So you could start by splitting this into two files, "Households" and "People", have a HouseholdID field in each table that connects the two together.

If you create a HouseholdID in your file, then save a copy as "Households" and eliminate all but one of duplicate addresses (finding them with !). Then re-serialize HouseholdID in the "Households" file, and write a script to fill in the HouseholdID in your people table.

Then create a relationship between the two HouseholdID fields, and you should be able to delete the address information from people and use the related address instead.

At that point, a similar script *could* fill in all members as a field of "Households", but that probably makes more sense to determine that information dynamically when you need it.

Hope this helps.

  • Newbies
Posted

I understand how to get the unique households transported over to a new file (and in theory how to create a HouseholdID field).

As for getting my gender info, how would I relate my file with unique households to my original? Assuming I create a relationship on address, how to figure out in the unique household file how many men and women live in each household?

Sorry if I'm missing something obvious.

Posted

Once you have the relational structure worked out, use something like this to get the counts of each gender in Household:

In People:

GenderM (calculation,number)= Gender="M" //This evaluates to 1 if Gender = M

GenderF (calculation,number)= Gender="F" //This evaluates to 1 if Gender = F

In Household:

CountGenderM (calculation,number)= Count(People::GenderM)

CountGenderF (calculation,number)= Count(People::GenderF)

Posted

Is this really what you want to do, though? Would it be easier to just have the one file of People and create a relationship that would group people within the file by unique household? Then you could analyze each household that is comprised of the people in the database.

You can create a calc field for each person:

Household = Address1 & "_" & Address2 & "_" & ZipCode

So each person at the same address in the same zip code is considered a member of the same houshold.

Then create a self-join on Household. Using the count function would let you figure out how many people in each household, and the gender makeup, etc.

Dan

Posted

DanBrill said:

Is this really what you want to do, though? Would it be easier to just have the one file of People and create a relationship that would group people within the file by unique household? Then you could analyze each household that is comprised of the people in the database.

And when it's time to change the address of a family?

Seems better to me to normalize the data into separate tables. This will be cleaner in the long run (after the data is cleaned up). Sometimes different families live at the same address or one family vacates an address (doesn't tell you) and another family moves in. It's best to make sure your system doesn't group people to a household just because of their address.

Posted

I'd agree, if you are creating a system from scratch, yes, bust out the addresses into their own table. But if you already have thousands of records in a flat file, grouping them by household may be faster for the purpose at hand.

This topic is 7478 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.