Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×
The Claris Museum: The Vault of FileMaker Antiquities at Claris Engage 2025! ×

This topic is 7159 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Posted

Greetings, I have an ENORMOUS database (16 million records, only two related tables). "AcctNbr" ties the two together. The problem is that there are records for the same accountnumbers in the Accounts table, one was created for every year the account was open,

"AcctNbr"=1, "BillYr"=1998,

"AcctNbr"=1, "BillYr"=1999,

"AcctNbr"=1, "BillYr"=2000,

"AcctNbr"=1, "BillYr"=2001

What I need to do is this:

1) Show records with duplicate "AcctNbr" and then show only the "top 1" ("top 1" refers to the 'most recent year'. In the above example it would only show me the record for "AcctNbr"=1, "BillYr"=2001--everything else should be hidden/deleted.

2) [i can handle this part], Show all records that don't have dupe AcctNbr and preserve these so they don't get deleted when doing the above search.

It is something like FIND "AcctNbr"=!

& then

"BillYr"=%ExcludeAllButTheMostRecentYear%

Help, suggestions?

Posted (edited)

You will need to use a looping script to go through the duplicates.

Not sure the number of records makes it a pleasant expirience, I would suggest the method I uploaded in this thread:

http://fmforums.com/forum/showpost.php?post/133778

But the script needs a little tweaking in order not to omit single occurences as well ...or might they be there as well???

Other ideas however pops up in my mind, a portal with reversed sortorder due to the ties mentioned or the Last( function over the relation made as a scripted replace stuffing the value in a field in the Account Table???

--sd

Edited by Guest
Updated Broken Link
Posted

That's an interesting technique. I have looked at a couple of alternative methods to avoid looping through all duplicates. Both techniques involved using a relationship based on the "<" operator.

One used GTRR to skip to the next group, and the other used a global key field to set the flag field via the relationship without moving between records at all. Although they both worked, they were horrendously slow compared to the standard looping script.

Posted

Out of curiosity, I tried implementing the Fast Summaries method, but it is much slower than the standard looping script (unless I implemented it wrong). It seems to slow down exponentially as the number of records increases. It took 154 seconds to process 3000 records, compared to 22 seconds for the classic looping script.

It appears that the running count summary field gets very slow to calculate as the record number increases. However this gave me an idea for a variation using a Count(selfJoin::etc.) calculation to skip through records. This method proved to be fastest at 11 seconds for 3000 records, twice as fast as the classic looping script.

I have attached a file with all the methods I've tested including the two extremely slow methods I mentioned in my last post.

FindDupsVariations.fp7.zip

Posted

I hope Mr. Horak (Comment) sees this, summaries under 7.0 is much slower than they used to be ...I'm glad you had the time to test it and tweak the idea a bit. This is why forums rock ... Given enough eyeballs as they say in sourceforge!

--sd

Posted

Interesting, I'm surprised that a find on an unstored calc field would be so fast. This is certainly a switch from my past experience. Good work.

On a hunch, I switched to a blank layout and re-ran my method and the classic method. In that situation, they both ran faster, and took almost the same amount of time (7 seconds for mine, 9 seconds for classic). The times given are for my Powerbook G4, 400 MHz. OSX panther.

Comment, with your method, I noticed that it usually runs in about 1/3 the time of the other methods (2 or 3 seconds), but in some situations it actually runs slower. I haven't quite figured out what the reason is. But if you run your "just find" method immediately after running my method, it will take a second or two longer (sometimes 8 or 9 seconds). It doesn't seem to be related to sort order or the number of records in the found count.

BTW, in order to meet the original requirements of this thread, the result of the find should produce the record with the latest year. This requires changing the formula for cFlag from:

Year=SameAcctNo::Year

to:

Year=Max(SameAcctNo::Year)

This slows things down somewhat, but is still fast.

I've attached another copy of the file which includes the modifications discussed.

FindDupsVariations3.fp7.zip

Posted

I have noticed some unexplained phenomena as well: the first time I added Perform Find to your script, it took about a minute. Then it came back to around 6 - 7 seconds. I made the flag unindexed, then allowed indexing, but it didn't make difference any more. Mine ran at around 5 - 6 seconds, never less - and my processor is 1.25GHz, so go figure.

No change of the formula is required, IMHO. The relationship is sorted, therefore SameAcctNo::Year already is the highest related year (which should be a number field, BTW). I have never tested this thoroughly, but I tend to suspect the Max() function of being slow, and rarely use it.

Posted

At least with FMP 6 and earlier, when a find is performed FMP pulls the field data down from all records to the client, and then (if possible) caches it to temp disk space. It gets re-used for subsequent searches in the same session.

Exit FMP (or maybe even close the database) and the cached data is deleted.

Posted

Interesting point - but then my find, which is performed on unstored field, should also take the same amount of time - at least the first time. But it didn't.

Posted

No change of the formula is required, IMHO. The relationship is sorted..

Oh right. I missed that.

As for the unexplained change in execution times, it might be worthwhile doing a flush cache to disk to see if that makes a difference.

<edited>

Okay I tried a flush cache, and it has no effect. But here is something interesting. If you make any change to any field data before doing the unindexed find, it takes 8-9 seconds. If you do it again immediately, it takes about 2 seconds. I suspect that as FM does the unindexed find, it builds a temporary index which it may re-use in subsequent finds as long as no changes are made to any field data. Hence the subsequent finds are fast until data is changed.

Posted

That sounds reasonable. Perhaps the poster should clarify how often he intends to do this - it seems there is no single best choice for all scenarios.

  • 5 weeks later...
Posted

I was just wondering guys...is there any way to do this without using a "flag" field? I used the classic script in a DB to weed out duplicates, but I ran into trouble when I wanted to further cull the records I want.

For example, first I want to narrow down the records so that their names show only if their end dates aren't duplicates, and to do this I use the classic loop and a global savedID field. If[custID=gsavedID and endDate = gsavedEndDate], and this weeds out everything I want.

Next, I want to show the customers that are only in Japan. I was thinking of using the flag field and just replacing the flags for fields that weren't in Japan. So, a record would be flagged if it was a duplicate and if it's not in Japan. The problem with this is that the flag field somehow becomes 'locked,' a problem I wasn't aware of when I started making the scripts.

I was hoping that there was a way to look through the records, and instead of flagging the ones that I didn't want to show, keeping track of the ones that I did want to show. I didn't know how to do this if I want to show a record that is has a duplicate name and ID #, with two unique end dates, and they're both in Japan.

(ie: Joe's Harleys, 43992, 5/12/06, JP 01

Joe's Harleys, 43992, 5/12/06, JP 02

Joe's Harleys, 43992, 12/12/07, JP 03

Jim's Ducatis, 43921, 3/4/06, JP 04

Joe's Harleys, 43992, 5/21/07, NZ 05)

Is it possible to show 1, 3, and 4? or is this just too complicated a task for FMP?

Any help will be greatly appreciated. I'm just so stuck right now it's not funny any more grin.gif

This topic is 7159 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.