Complicated Design/Solution Question

goldcougar · September 5, 2004

I am trying to create a database for an adoption project, where it will hold thousands of people with their DNA profiles. As new people are added it must check all records to find matches based on their DNA profiles. Is this possible to do with a real-time calculation using a cartesian product of 2 tables, or does this have to be a script that loops through records?

Any ideas are appreciated.

LiveOak · September 5, 2004

What exactly does a "DNA Profile" look like when stored in the database? The amount of information to be matched will determine the available approaches.

-bd

goldcougar · September 7, 2004

Each person will receive an Idno, Sex, Age, and then the profile. The profile consists of 15 markers tested at 2 spots each. So, we're looking at 30 values per person. The values are just numbers from 1-99.

CoZiMan · September 7, 2004

You could use a cartesian product with this free plug-in from Formulations Pro and or any of the other possible computation techniques although it seems for your needs that this will suffice.

You don't need a larger dimensional array.

Calcualtor Tool

Deep Thought II · September 7, 2004

i would recommend u to generate a sort of HEX value or another form of representation with alphabets and numbers, that is auto-entered by some form of calculation based on the 30 values.

then u only need to compare this key against the rest of the data.

u can also take a look at available search algorithms in programming. different algorithms will greatly improve ur search and compare time.

furthermore, if u have a large number of clients, are u sure that this key will not have duplications... (since it's not the entire string of DNA)...?

goldcougar · September 7, 2004

Thanks for all the responses...this is my first time posting a question here, and I'm pretty impressed by the feedback.

The search algorithm isn't the problem, the problem is comparing all records with each other. Here is the way it needs to work...

Imagine a database of lets say...10,000 people with their DNA Profiles. So, 1 record per person (in the person table), with fields of IDno, Age, Sex, Probe1a, Probe1b, Probe2a, Probe2b, etc up to Probe 15b.

Since people get 50% of their DNA from their mom, and 50% from their dad, the database then needs to find all records by which there is a 50% match accross all 15 probes (meaning ProbeXa matches, or ProbeXb matches).

------------------------

My initial throught was to have a record for each person, as mentioned above, then have another table called 'comparison', which would have 1 record per comparion between 2 people. (basically storing the cartesion product of the peron table with itself). Then I could do the calculations.....but thats a lot of records. 100 people turns into 100 records in the 'person' table and 10000 records in the 'comparison' table

Deep Thought II · September 7, 2004

actually, goldcougar, the search & match algorithm matters a lot in this case. because there are various methods for searching records, the resultant time varies.

most of the time we use "sequential search", that is, we compare key from the beginning or the end of the records, to the end or beginning of the records. this is only one of the methods that is generally used.

however, what if the records become large, if we stick to sequential search this will take too much processing time. we will have to go through each record for each key. therefore there comes a lot of other different search methods (or solutions), designed specifically for different needs of people (such as things that hold thousands or millions of records, with a single key or multiple keys).

in your case, this will be a multi-key search and match algorithm with a large number of records. you will need to find a suitable search & match algorithm to achieve a good time. otherwise you will be implementing a large amount of codes and scripts, and wait forever for the computer to finish a job.

an extra idea would be that you can export the key values to an external text file and use some other programs to process it instead of processing it within Filemaker.

although i haven't tried to do different search algorithms in Filemaker, but i really doubt if it's capable of complicated searches with the limitations of ScriptMaker.

i really can't recommend you which search algorithm to use right now because i haven't touched that stuff for 2 years!... i have actually forgot a lot of them :-( just remember a few key concepts... that's all.

i can only tell you... there are a lot of search & match algorithms out there, and they are designed specifically for different kinds of searches and length of keys. if you used the right one, you can shorten your processing time and scripts by a great amount. but if you used the wrong one, it will cost you lots of time doing searches... even headaches :-p

in the end, in a multi-key search environment and large number of records... i definitely will not recommend sequential search... :-D

hope that helps...

p.s. you might not want to do what you initially thought of... that is going to populate the database dramatically. although Filemaker says, theoretically it can hold infinite number of records provided that there is an infinite amount of physical storage space... but... your search & compare cost time will increase indefinitely as well...

BruceJ · September 7, 2004

I'd agree with CoZiMan, use the cartesian product.. or lock yourself in a room for a few hours and write one calc that creates a key that encompases all the various possibilities of the 30 matches.

You only have to do this once of course and it's not difficult, just tedious.

I've attached a simple sample file with only 4 markers.

The match field is a Calc - TEXT that looks like this:

marker1 & "#" & marker2 & "

DNA_test.zip

goldcougar · September 7, 2004

I appologize for the confusion, but the cartesian product is not among the probes, it is among the people. Meaning each person's probes, must be compaired with each other persons probes in the database.

So, if you have a records that look like:

Jane Doe

Probe1A = 1 Probe 1B=2

Probe2A = 3 Probe 2B=4

John Doe

Probe1A = 1 Probe 1B=6

Probe2A = 3 Probe 2B=7

Then the database will show that Jane Doe Matches John Doe because there is a 50% match (Jane's Probe1a=John's Probe1a AND Jane's Probe2A=John's Probe2A)

So, if there we 1000 records like that, the database would show all match combinations....Sorry for the confustion.

BruceJ · September 7, 2004

Gottcha GoldCougar.

The example in my previous post looks for matches between people, not within one person's record, thus the Self Realtionship which also includes a "does not equal" the person's ID. This "does not equal" will exclude matching back to itself.

Sign In

Complicated Design/Solution Question

Recommended Posts

goldcougar

LiveOak

goldcougar

CoZiMan

Deep Thought II

goldcougar

Deep Thought II

BruceJ

goldcougar

BruceJ

Create an account or sign in to comment

Create an account

Sign in

Similar Content

Recovery and reopening of FM7 files

Filemaker 6 for Mac issue.

FM3 upgrade to FM Pro 5.5 - Windows 8 installation possible?

Moving data from fp5 to fp13

Filemaker 5.5 database will not delete records on-line

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information