comment Posted May 26, 2008 Posted May 26, 2008 I am rather surprised it didn't slow down when you created 150,000 records in Target. After all, it has to check the uniqueness of each imported record against the entire Target table. It seems more like a quirk of implementation than any logical reason I could think of. It proves once again that ultimately everything needs to be tested. Michael? Can you tell me where I might find the reference to this? See: http://www.fmforums.com/forum/showpost.php?post/289917/ As you can see, you're not the only one who's behind on their speed tests...
SurferNate Posted May 28, 2008 Posted May 28, 2008 Speed seems to take a second for every 2,000 data records ( actually it is 4 seconds for 10,300 Data records ). If Data were huge maybe this wouldn't be the way to go but I truly believe that any method would still be slower than this one. Okay, I'll finally shut up. Hopefully Michael will fill in any blanks for us. So there is still in fact a time delay for the import? More than anything, the news, and new learning experience for me is to understand that large data sets can actually take a lot of time to process. I have become accustomed to me nice "small" data sets processing so fast, that I basically can ignore small processing time differences. It is interesting, and important to me to know that when data sets grow beyond a very limited range, structure, and method, are paramount to success. Shoot, the ability of Mac OSX to pull a clean list of found records almost "instantly" from a 120 GB drive (100K+ files) tells me a little something about the differences in types of data processing and storage. This little fish just took a tiny peek at the ocean...
comment Posted May 28, 2008 Posted May 28, 2008 the ability of Mac OSX to pull a clean list of found records almost "instantly" from a 120 GB drive (100K+ files) Filemaker can do the same thing - AFTER it has indexed the records. That's one thing to keep in mind when working with large data sets - always try to go the indexed route.
LaRetta Posted May 28, 2008 Author Posted May 28, 2008 (edited) I am rather surprised it didn't slow down when you created 150,000 records in Target. After all, it has to check the uniqueness of each imported record against the entire Target table. It seems more like a quirk of implementation than any logical reason I could think of. I've been considering what you said and I wonder if my test was not proper. I created the 150,000 records in Trigger randomly because I made the following assumption: Since FM must compare each Source record to each Target record to see if it is unique, then it doesn't matter WHAT the record itself contains (in Trigger), ie, it is the fact that it must be compared that would make the speed difference. So I originally turned off validation, ran I have attached my test file (which is simply yours with records added to both sides; approx 150,000 in Target and 10,000 in Source. I wonder if my mis-thinking skewed my test results and causes my test to be invalid. I also wonder if my premise is incorrect, ie, maybe FM does NOT need to compare every record. Maybe FM uses the Import map like a filtered relationship to only compare those records in Source to those records (if they exist) in Target. It's a twisted thought but then, I'm a twisted person. It seems that either my test is invalid or FM doesn't do what I would think (compare each record) ... UPDATE: Well, my test file was too big to attach. And I wonder if it's necessary anyway since it is the TYPE of data within it that I question. If you think it would be helpful to see the file then I can cut the numbers down a bit and attach it. DOUBLE-UPDATE: In Trigger, I added unique SourceID (I used Replace Contents serial) and flat value of 400 so the Unique_Concat was unique. The name I assigned as "test" & value. Then I turned the validations back on to run the speed tests. LaRetta Edited May 28, 2008 by Guest Added second update
comment Posted May 28, 2008 Posted May 28, 2008 It's difficult to construct a meaningful test, when you don't know the tested mechanism. That said, I would be happier with a serial ID and a random value - just to be sure the index doesn't have an easy job.
LaRetta Posted May 29, 2008 Author Posted May 29, 2008 (edited) Well, I randomized to the max. Unique_concat examples: 1416 | 1715.369362437803968 60914 | 648.079175058789152 I took the existing value, SourceID & " | " & Value * ( Random * 3 ) / .25 ... or some such gibberish. Re-running the test took 5 seconds. I then tried to remove the index which was quick. But, since index is required for unique validation, I had to add it back. And the index MUST be set to '[color:green]Automatically create indexes as needed.' It wouldn't work at all unless 'auto create indexes' was CHECKED. While reindexing, I couldn't leave Define Fields for probably 5 minutes. I then ran the test again and it took same [color:red]5 seconds. I speculate that, if indexed (and it must STAY indexed and that's why it requires 'automatically create indexes') then size of Target does not matter because FM requires that it be allowed to keep up on the index. Speed didn't increase by even one second (proportionally) for each size-group of records I tested, no matter the type or size of data in Unique_concat. Validation requires this up-to-the-minute index so remains very fast at the moment of import. I could be wrong but that's what the results seemed to say to me. Corrected red - I had written 5 minutes. LaRetta Edited May 29, 2008 by Guest
David Jondreau Posted May 29, 2008 Posted May 29, 2008 So, if a field in the Target table is indexed, FM passes the index on import, so 1) a new index doesn't need to get created and 2) FM can check the index for validation basically immediately? I think I'm confused but that's not unusual.
comment Posted May 29, 2008 Posted May 29, 2008 I don't think the index is "passed" on import - I think it is consulted by the import, and also updated during the import process. If there are duplicate values in the source, only the first one will be imported - even if the value didn't exist in the target table (and therefore neither in the index) beforehand. I'd venture to guess that if the table sizes were reversed, the import time would increase by the time it now requires to create the index.
LaRetta Posted May 29, 2008 Author Posted May 29, 2008 (edited) Yes, it seems the Index is consulted ( I like that description for it ). If it doesn't import the second duplicate then the index must be updated with each imported record and that would fit why 1) 'Automatic Index' must be on - to protect from the duplicate record being imported and failing validation and 2) the speed hit is on the Source side, because the index must be updated ( in Target ) for each record imported ( and speed decreased in proportion to number of records being imported ). FM insisting on an always-updated Index ( Automatically create indexes ) makes sense here. I confess that I manually control the 'auto' indexing and usually turn it off. This will be one instance ( and might there be more? ) where it must be on ( or works better if it is on ). This is wonderful stuff to play with. UPDATE: I also notice that I had said it took 5 minutes to run the test after it was reindexed ( see portion in my prior post in red ). It again took 5 SECONDS to run. Sorry for my error there. It took 5 minutes in Define Fields to re-index. But the tests ( importing ) always only took 5 seconds ( when I had 10,000 Source records ). Edited May 29, 2008 by Guest Added update
SurferNate Posted May 29, 2008 Posted May 29, 2008 (edited) Yah, anyway, while I burned keys typing this up it seems others came to a similar conclusion! It occurs to me that someone with time to waste might create a pure test file, that basically generates arbitrary sized data sets in at least two tables including stored and unstored values, along with dates, text, and numbers via a script. The one could just download the file and generate a test set of data locally to see if a certain structure is practical. So many conditions might affect the outcome. What if one needed to import and validate against the current value of an unstored calc across 100K records? LaRetta, I would think, after what I have learned here, that the validation is extremely fast because the "unique" value can be compared to a stored index on Target. The only speed issue here is the import of the Source data. You are really only parsing the full set of source records here, not Target records. I would be willing to wager that if an imported value on Source is either unstored, or Source is a very large set, then it will drag the process down. Comment may have named the real suspect here. Working with indexed data on both sides of any database transaction is probably more efficient by several orders of magnitude. Edited May 29, 2008 by Guest
Recommended Posts
This topic is 6081 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now