Need help in normalizing

Brudderman · July 9, 2005

I receive election results in a flat file, and am able to print the basic reports that I need. But as others have suggested to me, I want to go ahead try and normalize the database. I need help for that.

Here are the flat file's fields in the single table I have now: ElectionYear, County, Location, Contest, Candidate, Votes. Right now I have about 124,000 records for the general elections in 2002 and 2004. So there is a record for each candidate at each voting location in each county in each election. Thus each entity appears many times.

In a normalized structure, I can see these tables: County (for 75 counties), Location (3,124 locations within the counties, and some Locations change names from election to election), Contest (different contests for the elections), and Candidate (some candidates appear in both elections). But would I put votes in a separate table or in one of the other tables?

Does this simple approach make sense as a start, or would another arrangement be better?

Then how should I populate the new tables? In other words, how can I easily get a list of the unique 3,124 locations into the Location table? If I can get the basic data into the tables, I think I can create the proper keys for relationships.

I'm sure this is simpler than I'm making it, but I just can't seem to get a handle on it. Any suggestions would be appreciated.

Brudderman

Fenton · July 10, 2005

Because you receive the data in a flat file and the data is correct, there is no compelling reason to change it to a normallized structure. There would be considerable savings in file size, but only if you substitute IDs for the flat data, and use File Maintenance to Compact the file. It would also give you a list of counties, etc., and a way to Sum() results in Browse mode.

It's a good question, and an essential skill for a developer; many of my clients come to me with flattish files.

A fast method to get normalized data is to Export as summarized (grouped in 7). To get the counties: Show All Records. Sort by County. Export (as either text, or FileMaker if the data is inconsistent and needs cleanup). In the Export dialog there is a "Grouped by" panel above the fields (on the same screen in 7, finally :-), with the fields that you've sorted by. In this case County. Check it. Then export the County field. You will get only unique entries, hopefully all correct.

It is important that the data is consistent. If it is not (and it never is with businesses using flat files, as even a typo will add a unique entry), then you need to fix it in the original file first before exporting. One way to see this is to export as above, then look. Then fix it, in both files. Another is to go into Find mode in the original, and use Insert from Index to see the index for the field. Find the bad ones and fix them (use Replace Field Contents).

Once you have your unique entries, import them into their new table, which has an auto-enter serial ID defined. Be sure and leave the "allow auto-enter calculations" checkbox ON when you import.

Then create a temporary relationship between the name in the new table and the name in the flat one. Create the CountyID field in the flat table. Put your cursor in the field. Use the temp. relationship to Replace Field Contents, by calculation, the "name_relationship::ID" from the new table. That brings the new CountyID into the flat file, correct for each county.

Go to the Relationship Graph and switch the temporary "name" relationship to use the IDs instead. It's done. You can then either delete the original flat County field, or change it to an unstored calculation field, relationship::name. Or leave it as is.

An alternative to full normalization would be to leave the names and table as is, but create self-relationships on the names.

Brudderman · July 11, 2005

Fenton, thanks for the help on this. I do see how to populate my new tables if I decide to go that way. But from what you say, I'm not sure that I ought to normalize. When I receive new election data, it will be in the same flat file structure. Thus I can import it straight into my existing table.

And though I didn't mention it in the original post, there is a second flat file from another source which provides an additional level of detail. I already have the necessary relationships established and have constructed the initial reports by putting this file in its own table.

What I might do, however, is use your procedure for creating a table of unique locations and use it to construct a method of dealing with location names that change from election to election. Normalizing them might give me a better way of handling that problem.

As you suggest, I'll look at the self-relationships and a few properly constructed TO's.

I really appreciate your taking the time to respond. I'm beginning to understand the problem a little better.

B.

Sign In

Need help in normalizing

Recommended Posts

Brudderman

Fenton

Brudderman

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information