Small link table vs. several set fields?

bcolburn · November 8, 2011

I just learned about using link tables for one-to-many and many-to-many relationships. When should I do that as opposed to just creating set fields (i.e. visible on all records, even if empty) in the database? When the many list is HUGE it makes sense to make a link table. When there are fewer options, set fields make sense. It seems the goal is to avoid storing data in records that doesn't need to exist. But here's an example.

Every patient (record) in the main table has (at least) four coronary arteries. We are recording the findings on autopsy for each vessel. But the autopsy person doesn't always record each vessel. So sometimes we get 0, sometimes we get 4, and sometimes more. There are 4 common arteries we record data for. Does it make more sense to just have fields for those arteries (with a portal for additional arteries) and have empty fields if they are not recorded or should I make a Patient-Vessel link table? My goal is to improve database speed and efficiency, but I don't know if creating specific records for each patient patient is faster and displaying them in a portal makes more sense than just having fields for all for and occasionally having some be empty. The same goes for several other measurements we look at, including wall depths, valve sizes, etc.

How can I know which technique to use?

bcolburn · November 8, 2011

The more I think about it, the same kind of goes for any field. It seems that you could theoretically use a link table for any field in a record if you wanted to replace a drop-down/checkbox-type layout. When does it make more sense to use a link table instead? I'm missing the core guideline here, it seems.

comment · November 8, 2011

I just learned about using link tables for one-to-many and many-to-many relationships.

A link table is a method for resolving a many-to-many relationship into two (or more) one-to-many relationships. You do NOT use link tables for one-to-many relationships.

bcolburn · November 9, 2011

Can you elaborate on that with an example of each case you just gave?

bcolburn · November 9, 2011

I just found this example:

For example, consider a database designed to keep track of hospital records. Such a database could have many tables like:

a Doctor table full of doctor information
a Patient table with patient information
and a Department table with an entry for each department of the hospital.

In that model:

There is a many-to-many relationship between the records in the doctor table and records in the patient table (Doctors have many patients, and a patient could have several doctors);
a one-to-many relation between the department table and the doctor table (each doctor works for one department, but one department could have many doctors).

However, this is still unclear to me. For example, a patient may have records for several heart valves. However, each heart valve is also certain to have many patient records. More simplistically, a student could have multiple test scores and a particular test score is likely to be shared by several students. Based on what I am trying to grasp, it seems like you could call that a many-to-many relationship and use a link table for test scores and students, but why not just create a field called "test scores" and create a value list of scores? What is the advantage of using the many-to-many relationship? I realize this particular example is laughable, but it is helpful for me to ask, as I'm still a little lost.

comment · November 9, 2011

A & B are lawyers; they handle cases for their clients. A client can have many cases, but each case belongs to one client only. This is a one-to many relationship;

Clients -< Cases

C & D are entrepreneurs; they handle projects for their investors. An investor can invest in many projects and each project can have many investors. The relationship Investors >-< Projects is many-to-many and needs to be resolved into two one-to-many relationships with the help of a Investments join table:

Investors -< Investments >- Projects

comment · November 9, 2011

More simplistically, a student could have multiple test scores and a particular test score is likely to be shared by several students.

Actually, no: a score (as I understand the term) belongs to one student only. Many students can take the same test, but they each get their own score:

Students -< Scores >- Tests

If by "score" you mean A, B, C, D, and F, then yes - a score is shared between many students (or rather many students' test results). But each result has only one score.

bcolburn · November 9, 2011

It seems to me that the issue is knowing when something should be an entity and when it should be an attribute. For example, if a patient could have any number of 8 arteries examined and each examined artery has four additional data points to describe it, what makes more sense:

Create a Patients table and an Artery Report table (linked by patient_ID), then create four rows of drop-down fields that allow me to select the arteries observed from a value list then enter data. If no arteries are observed (possible), then there would be several rows of empty fields on the record (which is bulky, no?)

OR create this relationship:

Patients -< Patient’s Particular Arteries >- All Possible Arteries, then display the link table records in a filtered portal on the Patients record.

Our database is HUGE and we are capturing SEVERAL different types of data like this (valves, wall measurements, etc.), all of which are inconsistent in how many data points we receive and thus, need to account for. So what makes the most sense? Is it bulkier (and hence, going to tax the server we store the file on) more to have tons of link table records or to just have lots of empty fields? Is it still appropriate to treat arteries as an entity if there are so many other fields?

I can't upload my database for some reason, so hopefully this makes sense.

comment · November 9, 2011

Probably none of the above. :hmm:

Let me ask a few things that are missing from your description:

1. How many such Artery Reports can one patient have? Earlier, you called it an autopsy - is that the answer here?

1b. What percentage of patients will have such report?

2. What information will your solution store about an artery as such?

3. What is the purpose of gathering all this data? For example, will anyone ever ask: how many "xyz" results do we have for any type of artery?

Edited November 9, 2011 by comment

bcolburn · November 10, 2011

Oh good. Great.

In the case of valves, wall measurements, etc. all the data comes from a single report. But the report is not standardized (it is a narrative), so the arteries described (if described at all) differ from patient to patient. But each patient either has a report or doesn't. In these cases, the data we stores isn't complex. Maybe just two or three text fields to record measurements or condition of the structure.

I asked my boss and of course he says the end goal (as this is for research) is to be able to ask ANY research question that tabulates one or more fields from all patient records (i.e. how many patients had moderate aortic stenosis). My question is: what is the most parsimonious way to design the tables/fields/relationships to allow for this and still be efficient?

As an idea, the other things I am using link tables for right now: Different types of patient contact information, Patient study reports, Chemicals analyzed on chem/toxicology reports, Known medical records (by location), Patient prescriptions, and Patient Conditions.

Oh, and right now about 70% of all patients in the database have autopsy reports (which contain the valve/wall measurement permutations)

Again, what is the missing criteria I should be asking to determine if I should use a table or a field? Number of main entities with such child records? Significance of the data? Number of related data points for the particular subject?

comment · November 10, 2011

1. How many such Artery Reports can one patient have?

each patient either has a report or doesn't.

So one at most?

What information will your solution store about an artery as such?

the arteries described (if described at all) differ from patient to patient.

This doesn't answer my question about describing an artery as such. IOW, if you had a table of Arteries (with 8 records, I believe) - what fields would be in such table, apart from the artery name?

bcolburn · November 10, 2011

Yes, each patient only has one autopsy report at most. However, that report contains several sections, each with its own sub-sections and fields. For example, a patient MAY also have a toxicology report or histology report (and then if histology, they may or may not have a fibrosis grid). We get them all at the same time, but should I treat ALL of this data as one big record? Or should I treat each separate document (e.g. histology/toxicology) as a record? Subsection? Does this answer change if I say we are only interested in abnormal findings and don't want to record findings that are unremarkable? This was the reason I thought of this approach. It lets us create only records when abnormal findings are present and ignore data that we don't care about.

As for "as such", it depends on the question. For example each gross anatomy report should contain a description of vessels. We have this broken up into sub sections as such (formatted as HEADING:fields)

CHRONIC CAD: Condition, % Stenosis, Stenosis Location

SCARS: Domain of Heart, Scar Type, Depth, Width, Length

ACUTE CAD: Vessel, % Stenosis, Thrombus Present, Plaque Rupture Evident

Certain other sub-sections have more fields. But if you step back to the level of Gross anatomy (as a portion of the overall autopsy report), there are over 150 fields. This in turn is just one of eight documents we collect data from (though by far the most comprehensive).

Most importantly, the database is already being used with real data already entered. So I want to keep the data as close to its original state as possible while just reconsolidating it, if possible.

bcolburn · November 10, 2011

This is what I'm trying to work on fixing (first pic) and a possible solution that I adapted from previous situations (second pic).

comment · November 10, 2011

Yes, each patient only has one autopsy report at most. However, that report contains several sections,

I suggest we concentrate on the arteries for now.

I don't see the need for an Artery Reports table - if only because there's practically nothing to enter there other than a date. If the answer to my question in point 3 above is yes, then you absolutely cannot have dedicated fields for each type of artery.

This means there must be a record for each patient/artery combination - roughly along the lines you described as your second option:

Patients -< Patient’s Particular Arteries >- All Possible Arteries, then display the link table records in a filtered portal on the Patients record.

Except:

• I see no real need for an 'All Possible Arteries' table (I asked what fields would be there, and I don't think you have provided an answer);

• I don't see why the records need to be displayed in a filtered portal.

So far, I see a simple one-to-many relationship:

Patients -< PatientArteries

with the PatientArteries table having fields for:

• SerialID

• PatientID

• Artery (select one of 8 possible arteries)

• Finding (?)

The last one is iffy and depends on the exact meaning of:

each examined artery has four additional data points to describe it

bcolburn · November 10, 2011

That makes sense to me. What about this though? Someone may have one pathology report but not one histology sub-report. You cannot have histology without pathology. And if you have histology, you may have slides, but not necessarily. And you cannot have slides without histology and pathology both. What would that relationship look like? If you can answers this, then I think I may be good for a day or so ;)

bcolburn · November 10, 2011

Also, if I'm only recording abnormal records and ultimately do want to be able to say how many people had an abnormal finding in this particular sub-section of the autopsy report (e.g. gastro-intestinal), then does it makes sense to create a separate link table like this:

Patients = Autopsy Report -< Abnormal Section Record >- Autopsy Section -< Section Parameters

and use a second Section Parameters TO to record the actual results in each Abnormal Section Record?

Or should I just suck it up and place all the fields in the report not already in a new link table in the main Autopsy Report table?

comment · November 10, 2011

If you can answers this,

No, I am afraid I cannot - not in the way you presented it. The devil here is in the details. However, in my layman's imagination, I see the entire autopsy report as an extension of the same principle:

Patients -< Findings >- Tests

This may get more complicated if there are Tests that require the related records in Findings to have some special details - but I think we have been there already in another thread...

Patients = Autopsy Report -< Abnormal Section Record >- Autopsy Section -< Section Parameters

Yep, that's the one.

bcolburn · November 10, 2011

OK. Just spent my morning rehashing the relationships in a new document (see attached).

One major goal of our database is to be able to know where any single piece of information came from (i.e. which of the particular documents we have collected for a patient). To that end, we need to be able to both:

1.) Look at any piece of data and know where which document it came from,

2.) Easily produce a list of all documents we have collected for a particular patient as well as a list of the documents we know exist but

have not yet received.

Given everything we talked about in this thread, this is what I came up with. Does this seem logical to you? If not, where am I going wrong? I know you can't fix every little detail, but can you see any trends in the mistakes I am making? If so, can you suggest a good place to go to learn the data I need in order to correct them?

P.S. Since I am on a deadline and this is strictly about the relationships between tables, I did not actually design any layouts. I have only worked on the relationship graph.

New Relationships.fp7.zip

comment · November 10, 2011

Sorry, I am afraid you expect too much out of this. I don't know anywhere near enough about your real situation to make such a wide judgment.

In general, it's best to work out the relationships on an ERD first. Assuming that (1) all test results of a single test come from the same document, and (2) one document can be the source of two or more tests, the ERD regarding the parts we did discuss would probably look like this:

However, if the second assumption is not true, then PatientTests and Documents are one and the same.

As I already hinted earlier, I think that if you generalize the schema, it can contain most, if not all of patient's medical info; after all, it's just a collection of answers to a number of pre-defined questions. The questions themselves could be of any type - not just about arteries or autopsy.

bcolburn · November 10, 2011

OK, that is helpful. Simple examples work well.

Out of curiosity though, why repeat TO's for each document data table instead of creating a linear relationship such as:

Patients -< Patient Documents -< Data Tables

OR

Document Types -< Patient Documents -< Data Tables

If I used a separate TO of Documents for each data table, when I ultimately wanted to display all known records for a particular patient, which portal would I use?

P.S. I had to write, erase, and rewrite this response several times as I began to understand all of the implications for your answer. :)

comment · November 10, 2011

I have no more information now than I had before... My feeling is that you have way too many "Patient Data Tables".

bcolburn · November 10, 2011

You were right. I reassigned tables to make my data more generic (thank you for being patient while I finally got there...). My only remaining question (for the day) is this.

Going back to your the ERD you uploaded, how would you use that (or change it) to reflect the fact that each patient may have multiple different medical records from different locations. In your diagram, my records in the Documents table would be something like: Ambulance Runsheet, ER Records, Medical Examiner's Report, etc. but I want to cite the specific location of the patient's medical record.

I'm not really sure how to make medical records available for citation ONLY when a patient has existing records there. I have attached my ERD attempt to solve the issue. The darker green table is used to create records for patients and the lighter green TO is intended to then register those records as collected documents.

Vaughan · November 10, 2011

IMHO you're focussing on storing the data, rather than trying to get a job done (ie a business process).

Generalise. Focus on getting the job done. The job isn't storing the data -- a filing cabinet will do that with little effort. The job is... well I dunno, what exactly is the job?

bcolburn · November 10, 2011

Well the database was VERY poorly designed when I inherited it and our IT guy said that the calculations on our database overview page (with only 250 patients) were crashing the server based on the overall database design. I was tasked with redesigning the database to make it as efficient as possible so when we get 5,000 patients over the course of this 5-year study, our computers don't catch fire. Am I misinterpreting my job? If so, help reinterpret, please.

I really don't mean to keep bothering the guys helping on here, but I don't know where else to ask these questions. I have no database design experience and am getting paid to do it now, even though I wasn't hired to. I appreciate any bit I can get and totally understand if at some point, people have to say "Not my problem."

comment · November 10, 2011

Going back to your the ERD you uploaded, how would you use that (or change it) to reflect the fact that each patient may have multiple different medical records from different locations.

I am not sure it has to change. Let's take an example of an ER report for patient Adam. Suppose this report has blood test results and urine test results.

So far we have:

Patients:

• PatientID: 1, Name: Adam

Documents:

• DocumentID: 1, Title: ER Report

TestTypes:

• TestTypeID: 1, Type: Blood

• TestTypeID: 2, Type: Urine

PatientTests:

• PatTestID: 1, PatientID: 1, DocumentID: 1, TestTypeID: 1

• PatTestID: 2, PatientID: 1, DocumentID: 1, TestTypeID: 2

Now, in the TestResults table, we'll have about 20 records for the blood results and 12 for the urine.

I hope this make sense - otherwise you'll be seeing my ER report... :frantics:

bcolburn · November 10, 2011

Yup. Makes sense. Enough for today. Thanks for bearing with me. I'm only 23 and I graduated in Humanities. I'm learning as fast as I can. :)

comment · November 10, 2011

Make sure you - and your boss - read this:

http://fmforums.com/forum/topic/57970-does-anyone-takes-us-seriously/

bcolburn · November 14, 2011

Dude, you're preaching to the choir. I told him he should have hired a database designer for this WEEKS ago. Alas, the world of limited grant funding. Sorry for implicating you!

Sign In

Small link table vs. several set fields?

Recommended Posts

bcolburn

bcolburn

comment

bcolburn

bcolburn

comment

comment

bcolburn

comment

bcolburn

comment

bcolburn

bcolburn

comment

bcolburn

bcolburn

comment

bcolburn

comment

bcolburn

comment

bcolburn

Vaughan

bcolburn

comment

bcolburn

comment

bcolburn

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Explore

Affiliate Forums

Activity

Important Information