Jump to content
Server Maintenance This Week. ×

Two files useful for hiding complex relationships?


This topic is 6999 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have a project which basically involves two stages of operation: The Import & Cleanup stage, and then the Daily Use / Reports stage.

The Import & Cleanup stage has a terrible mess of relationships -- because we get non-normalized data from another agency without proper unique keys, we have do to all sorts of fancy fuzzy matching to get proper primary keys on the data. For example, I have a "worker" table and a "payroll" table that are related by ID, then by Last Name, then by Last Name + First Name, then by Last Name + Department, etc. This happens for 4 sets of tables, so the relationship graph is a mess.

However, once the import and cleanup step is done, the rest of the solution has the luxury of operating on clean, elegant relationships with fully normalized data.

My question is B) If I structure it properly, it sounds like I could use a two or more file solution to hide this complexity. Basically, I'd create one "import" file which would have all these messy relationships and import tables, and scripts to manage them. Then I'd create another "daily use" file which contains only the cleaned up normalized tables related by proper PK->FK relationsihps.

It seems like, in the long term, having this separation would make maintenance and design work much easier.

My assumption here, which I'm not sure about, is that if I have relationships in file "B" between two tables, that these relationships will not show up in File "A" ? Is this true? Or, once you put a TO for a table in file B on the graph for table A, does that suddenly bring in all the messy relationships for that table...

Link to comment
Share on other sites

No, TO's are whatever you define them to be, they don't automatically appear.

I would delete any transferred records from the messy file.

There would be several relationships from the messy file to the clean file, looking for possible duplicates, on various fields (these should be compound relationships if needed, not concatenated keys). But I can't see why you'd have any from the clean file to the messy one; there's no ID in the messy one, and as I said, tranferred records are only in the clean file.

Link to comment
Share on other sites

There would be several relationships from the messy file to the clean file, looking for possible duplicates, on various fields (these should be compound relationships if needed, not concatenated keys).

I agree on both counts.

But I can't see why you'd have any from the clean file to the messy one; there's no ID in the messy one, and as I said, tranferred records are only in the clean file.

Hmm. I'm using a differently methodology. When I'm cleaning up the "messy" data, I do it in the messy table, searching for and finding the matching record in the "clean" table -- when I find a match I insert the clean table's PK into the messy record. Ideally, I'd like to keep a copy of the "messy" data on line and easily viewable (because I need in some cases to be able to, for legal reasons, se exactly what data a certain agency provided us). So, if I do have a relationship from the clean table to the messy table in the "clean" file, I'm still worried this would then cause all of the "messy" relationships to show up when I'm working in the clean file. (Perhaps not on the relationship graph, but instead they'd show up everywhere else? Calculations, layouts, etc.?)

I should try it and see what happens...

Link to comment
Share on other sites

This topic is 6999 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.