June 1, 200421 yr Newbies Here's problem that has me stumped. I have a giant database of people, with addresses, genders, etc. I'd like to get this to the point where I have one record for each unique household where each household record includes the number of people in the household by gender, i.e. MM or MF or FF, etc. Anyone have thoughts one how to efficiently do this?
June 1, 200421 yr Sounds reasonable. If you would provide some information about your current setup (your files and the relationships between them), you could get some advice on how to get there from where you are now.
June 1, 200421 yr Author Newbies It's simply a flat file, albeit one with a very large number of records. The records are all the people in a given area, with each person's address, gender, and other extraneous fields. Gender is either M, F or blank. I'd like to create a record for each unique household that tells me how many men or women live at that address.
June 1, 200421 yr Oh, okay. So you could start by splitting this into two files, "Households" and "People", have a HouseholdID field in each table that connects the two together. If you create a HouseholdID in your file, then save a copy as "Households" and eliminate all but one of duplicate addresses (finding them with !). Then re-serialize HouseholdID in the "Households" file, and write a script to fill in the HouseholdID in your people table. Then create a relationship between the two HouseholdID fields, and you should be able to delete the address information from people and use the related address instead. At that point, a similar script *could* fill in all members as a field of "Households", but that probably makes more sense to determine that information dynamically when you need it. Hope this helps.
June 1, 200421 yr Author Newbies I understand how to get the unique households transported over to a new file (and in theory how to create a HouseholdID field). As for getting my gender info, how would I relate my file with unique households to my original? Assuming I create a relationship on address, how to figure out in the unique household file how many men and women live in each household? Sorry if I'm missing something obvious.
June 1, 200421 yr Once you have the relational structure worked out, use something like this to get the counts of each gender in Household: In People: GenderM (calculation,number)= Gender="M" //This evaluates to 1 if Gender = M GenderF (calculation,number)= Gender="F" //This evaluates to 1 if Gender = F In Household: CountGenderM (calculation,number)= Count(People::GenderM) CountGenderF (calculation,number)= Count(People::GenderF)
June 1, 200421 yr Is this really what you want to do, though? Would it be easier to just have the one file of People and create a relationship that would group people within the file by unique household? Then you could analyze each household that is comprised of the people in the database. You can create a calc field for each person: Household = Address1 & "_" & Address2 & "_" & ZipCode So each person at the same address in the same zip code is considered a member of the same houshold. Then create a self-join on Household. Using the count function would let you figure out how many people in each household, and the gender makeup, etc. Dan
June 2, 200421 yr DanBrill said: Is this really what you want to do, though? Would it be easier to just have the one file of People and create a relationship that would group people within the file by unique household? Then you could analyze each household that is comprised of the people in the database. And when it's time to change the address of a family? Seems better to me to normalize the data into separate tables. This will be cleaner in the long run (after the data is cleaned up). Sometimes different families live at the same address or one family vacates an address (doesn't tell you) and another family moves in. It's best to make sure your system doesn't group people to a household just because of their address.
June 2, 200421 yr I'd agree, if you are creating a system from scratch, yes, bust out the addresses into their own table. But if you already have thousands of records in a flat file, grouping them by household may be faster for the purpose at hand.
Create an account or sign in to comment