Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×

This topic is 7161 days old. Please don't post here. Open a new topic instead.

Recommended Posts

  • Newbies
Posted

I have a very large table (137,000 records - 49Meg DB) that I am processing with a script. Part of the script is to move the contents of a field in table1 (the large one) to a field in table2. The move is accomplished on a field by field basis and then the record is indexed using GotoRecord/Request/Page[next]. The script works fine but the further it indexes into table1 the slower it gets. By the time it gets to record 5,000 the move is taking minutes (per record!!). Allowing the script to run for 15 hours on a G5 with 2Gig ram only processes 10,000 records. I have tried using the Flush Cache to Disk script step, increasing cache memory to 99M, index everything, index nothing but nothing seems to improve the performance. BTW, doing a manual Find/Replace on a particular field takes about 15 minutes. Any ideas?

Posted

Hi John,

To solve this specific question, we'll likely need to see the actual script steps. Also, is this running on Server or not? And what specific rev of FileMaker are you running?

But I would ask why it's necessary to move the field contents from one table to another. Could you simply reference the related field's data? (Are they related?)

  • Newbies
Posted

Version is Pro7. It is running on a standalone G5.

Background. I have been given a single large table containing 137,000 records which I will refer to as table1. This table is basically an aggregation of a large number of lists that have been imported and massaged to line up with the field definitions. The basic stuff is there like name, address, city, phone, email, what list they were imported from and membership date. There are many, many duplicates. The task is to de-dupe table1 based on name and address but not lose any of the other data. So while the name+address fields may match, the rest of the field contain unique data that needs to be preserved in the resulting de-duped record. I.E. it is important to know how many times the person was in table1 and all of their respective sources.

My solution. I created a new table (in the same database) that I will refer to as table2. This table has all the same primary fields as table1 but I have added additional fields that are basically new instances of important fields in table1. For instance, table1 has a listname field that contains what list the record came from. Table2 has a listname1, listname2, etc. Bad table layout, I know. I do not want to create additional tables and define relationships between them to hold these values. Table1 is sorted based on name and address and table2 is empty.

The script starts with the first record in table1 and seeds table2 with all the pertinent fields. This sub-script is called moved MoveRecord and uses the New Record Request, copy, and paste script steps. I discovered that in order to use the Copy and Paste script steps, you have to keep going back and forth between tables using the GotoLayout[] script step. Ponderous but whatever.

The script then moves to the next record in table1 using GotoRecord/Request/Page[next] and compares the name/address with the name/address of the current record in table2. If they match, then the desired fields from table1 are moved to the next available corresponding fields in table2. IsEmpty checking is done on the respective fields in table2 and moved on to next field until an empty destination field is found. There are 2 calculated fields in table2 that keep track of how many times the person appears in table1.

If the name/addess don't match, then the record in table1 is moved to a new record in table2 and the process starts all over again using the Loop script step.

When the last record in table1 is processed, table2 contains a de-duped list of names/address plus all the additional fields in the table2 record.

Pretty simple, it works but it is slow. Using the Pause script step and stepping through the script, I have determined that it is moving the fields from a record in table1 to a record in table2 that takes forever once you get more than a couple thousand records into table1.

Long explanation but I hope it clarifies things a bit.

Posted

From the algorithm you described, it makes sense that it takes longer with the later records.

There are a few ways to get unique records from a set with duplicates, and there are lots of threads about this in the forums. You could use any of these methods to get your unique name-address combinations into a separate table from your original, then keep your original records to hold those additional fields. You should be able to get all the unique name-address pairs into a Names table, and keep the original records with the additional source information in your Source table, without any nested looping.

One method for de-duping you might try is sorting the Source records by your name-address calc, and exporting the name and address fields, using the 'Group by' option to group the exported records by the name-address calc. Then import this data into the Names table.

I'd also suggest that you create a serial ID field in the Names table, and populate this and a corresponding ID field in the Source table, so you don't have to rely on the name-address calc as a key into the future (names and addresses tend to change.)

This topic is 7161 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.