Two files useful for hiding complex relationships?

xochi · March 5, 2005

I have a project which basically involves two stages of operation: The Import & Cleanup stage, and then the Daily Use / Reports stage.

The Import & Cleanup stage has a terrible mess of relationships -- because we get non-normalized data from another agency without proper unique keys, we have do to all sorts of fancy fuzzy matching to get proper primary keys on the data. For example, I have a "worker" table and a "payroll" table that are related by ID, then by Last Name, then by Last Name + First Name, then by Last Name + Department, etc. This happens for 4 sets of tables, so the relationship graph is a mess.

However, once the import and cleanup step is done, the rest of the solution has the luxury of operating on clean, elegant relationships with fully normalized data.

My question is If I structure it properly, it sounds like I could use a two or more file solution to hide this complexity. Basically, I'd create one "import" file which would have all these messy relationships and import tables, and scripts to manage them. Then I'd create another "daily use" file which contains only the cleaned up normalized tables related by proper PK->FK relationsihps.

It seems like, in the long term, having this separation would make maintenance and design work much easier.

My assumption here, which I'm not sure about, is that if I have relationships in file "B" between two tables, that these relationships will not show up in File "A" ? Is this true? Or, once you put a TO for a table in file B on the graph for table A, does that suddenly bring in all the messy relationships for that table...

Fenton · March 5, 2005

No, TO's are whatever you define them to be, they don't automatically appear.

I would delete any transferred records from the messy file.

There would be several relationships from the messy file to the clean file, looking for possible duplicates, on various fields (these should be compound relationships if needed, not concatenated keys). But I can't see why you'd have any from the clean file to the messy one; there's no ID in the messy one, and as I said, tranferred records are only in the clean file.

xochi · March 5, 2005

There would be several relationships from the messy file to the clean file, looking for possible duplicates, on various fields (these should be compound relationships if needed, not concatenated keys).

I agree on both counts.

But I can't see why you'd have any from the clean file to the messy one; there's no ID in the messy one, and as I said, tranferred records are only in the clean file.

Hmm. I'm using a differently methodology. When I'm cleaning up the "messy" data, I do it in the messy table, searching for and finding the matching record in the "clean" table -- when I find a match I insert the clean table's PK into the messy record. Ideally, I'd like to keep a copy of the "messy" data on line and easily viewable (because I need in some cases to be able to, for legal reasons, se exactly what data a certain agency provided us). So, if I do have a relationship from the clean table to the messy table in the "clean" file, I'm still worried this would then cause all of the "messy" relationships to show up when I'm working in the clean file. (Perhaps not on the relationship graph, but instead they'd show up everywhere else? Calculations, layouts, etc.?)

I should try it and see what happens...

Sign In

Two files useful for hiding complex relationships?

Recommended Posts

xochi

Link to comment

Share on other sites

Fenton

Link to comment

Share on other sites

xochi

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information