Fastest way to find non-matching records

xochi · June 5, 2008

(Posted to the Server forum as I believe this is a performance issue specific to Server)

I have two very large tables (600,000 records each) that are related by a 2 field relationship -- Date, and ID (numeric). Call them Table A and Table B. It's a one-to-one relationship.

I have a need to make sure that every record in table A has a child in table B.

Right now, I'm doing this:


Go to layout (Table A)

Perform Find (finds for TableB::ID = *)

Show ommtted records

loop

  set field TableB::ID ; Table A::ID

  go to next ; exit after last

end loop

The problem: The initial find step takes 15 minutes or so! I've tried using the inverted find (e.g. search for TableB::ID = "=", but it seems that a Find for a "=" in a non-existant child record will not return anything...

Ideas? Is there a Find syntax for "find for records in table A which have no child in table B"?

I've tried GoToRelatedRecords (match all in found set) but it seems to have the same performance issues as the find for TableB::ID = "*"

mr_vodka · June 5, 2008

This may be extra overhead but what happens if you create a concatenated calc field of ID &"_" & Date in both tables, use ≠ as the join operator, and then GTRR... It may be faster.

xochi · June 5, 2008

I think you may be right -- with 600K records, the 2-field match is just too much. I may try an alternate approach: an auto-enter calc in TableA that pulls in the serial number from Table B, and then create a new relationship which is based on that serial number alone. This single-key should be much faster (I'd hope).

mr_vodka · June 5, 2008

Also...

This recent post may be of interest to you...

http://fmforums.com/forum/showtopic.php?tid/195605

LaRetta · June 5, 2008

Still, I wonder ... have you tried ( and I possibly just didn't read ) starting in Table B and GTRR to Table A? Then Show Omitted?

xochi · June 5, 2008

The preliminary testing with GTRR showed it to be no faster... :

xochi · June 5, 2008

Also...
This recent post may be of interest to you...

http://fmforums.com/forum/showtopic.php?tid/195605

Trying to understand this technique applied to my case. Would this mean that I would set up an import from table A to table B, and that if field validations in table B are correct (e.g. a Unique requirement on one of the fields) that this process happens very quickly?

LaRetta · June 6, 2008

I'm unsure if it would help you here but I think John is right - I think it would do the trick!!

You concatenate Date and ID in Table B. Set the validation to be 'always' and 'unique.' You don't even need to worry about showing all records first in Table B. Then script would be:

Go To Table A

Show All Records

Go To Table B

Import [ ADD ONLY and specify the two fields and any others you wish)

... if the records don't exist ( based upon the unique calculation you created ), they will be added into Table B. If they already exist in Table B, they will be skipped and not imported because they failed unique validation. It is very fast; particularly when, even after you have your found set, you still need to manipulate or create records in Table B in other methods but not this one.

I should have seen the connection to your need ... way to go, JOhn! :laugh2:

UPDATE: Actually, it is Michael's technique. But I wanted to expain a condensed version because I got a bit long-winded on that thread about it. But the technique is incredible!

LaRetta

Edited June 6, 2008 by Guest

xochi · June 6, 2008

Thanks all for the help. I've found what I think is a decent compromise.

Add a serial number field to table B, and a new foreign field to table A: TableA::fkTableBSerial (auto enter value = TableB::Serial, Indexed)

Then, my script changes as follows:


Go to layout (Table A)

Perform Find (finds for TableA::fkTableBSerial = "")

loop

  set field [TableB::ID ; Table A::ID]  /* creates the sibling record  if needed, but won't create a duplicate if it already exists */

  set field [TableA::fkTableBSerial; TableB::Serial] /* remember the foreign key */

  go to next ; exit after last

end loop

With this change, the initial find for non-matched records is lightning fast (a few seconds or two) since it's operating on a stored, indexed field. I don't have to create concatenated fields in table B and A.

However, in the data I'm using, 95%+ of the child records already exist in table B, so some of the performance may not be relevant to other situations.

I like the idea of using an import, too -- seems like both methods have some tradeoffs (adding extra fields vs. adding extra script steps, etc.)

Also, I realized my data set is bigger than I had said: Table A is 600K records, but Table B is 1.2 million+. Also, I was wrong when I said it was one-to-one, it's actually Many(TableA) to one (TableB).

We really need to put together a public domain benchmark suite for FileMaker server. :

Edited June 6, 2008 by Guest

Sign In

Fastest way to find non-matching records

Recommended Posts

xochi

mr_vodka

xochi

mr_vodka

LaRetta

xochi

xochi

LaRetta

xochi

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information