Jump to content

Fastest way to get most recent related record?


xochi
 Share

This topic is 4347 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I have a Person table and a Payroll table, and they are related by a numeric ID number. The Payroll table has entries only when a paycheck is issued (once per month, on the 1st).

I need to calculate, for each person, the # of months since the last paycheck.

This is in a very large database (nearly 100,000 people with several million payroll records) so performance is critical.

Question: what's the most efficient way to do this calculation?

* Using an un-sorted relationship based on Person::ID = Payroll::ID, I could use

Max(Payroll to Person::Date)

However, logically this requires FM to access every related record to figure out which is the highest value.

* Or, I could use a sorted relationship (Person::ID = Payroll::ID, sort by Date [Descending], in which case

Payroll to Person::Date

will give me the most recent date.

However, logically this requires FM to sort every related record to figure the highest value -- sorts in FM seem fairly slow.

As it turns out, the records in the Payroll table always arrive in chronological order, so in theory I could use something like this to simply grab the last record (using an un-sorted relationship)

* Let ( dates=List(Payroll:Date) ; RightValues(dates;1) )

I'd appreciate real-world advice/experience if anyone has benchmarked these methods and found one to be better than the other, or if there are other techniques that are even faster?

Also, this is in FMServerAdvanced 9v3 on Mac.

EDIT: typos, clarification.

Link to comment
Share on other sites

How about:

Last ( Payroll::Date )

Thanks : I didn't know about that Last() function.

I will experiment with this. It turns out that there are some cases where the payroll data is out-of-order (I had forgotten this) so I can't use this in every case, but in those cases where I can, I suspect Last() will be much faster than using Max() or the sorted relationship...

Link to comment
Share on other sites

Disclaimer: I have not tested this and I know practically nothing about the inner workings of FMP. The assumption that Max() would be slower than Last() may seem reasonable - but over the years I've seen even more reasonable assumptions refuted by actual tests.

Link to comment
Share on other sites

Disclaimer: I have not tested this and I know practically nothing about the inner workings of FMP. The assumption that Max() would be slower than Last() may seem reasonable - but over the years I've seen even more reasonable assumptions refuted by actual tests.

I agree 100% -- some things that should be faster aren't.

Here's some actual benchmarking data. The scenario is as described. The test is using a "replace all" to set a numeric field with a calculated result.

1,000 records. All times run 3x in a row and the final result taken (so that all data is cached in memory).

Time  Calculation

-------------------------------------------------------------------------------

  8.5  Person to Payroll:Date (relationship by ID, sorted on date reverse)

  5.6  Max(Person to Payroll::Date) -- relationship by ID only, not sorted

  4.3  Last(Person to Payroll::Date) -- relationship by ID only, not sorted

So, it does appear that the speeds make sense -- Last() is faster than Max() which is faster than getting the first record of a reverse-sorted date relationship.

What's really interesting is the speed difference on repeating the runs. The first time I ran the test (when none of the records were in the cache) it took over 110 seconds, or 15 times slower.

So I think in my case, the problem is not so much sorting or processing the records, as it is simply reading the records from disk. I know this server needs more ram and a faster disk, so that may be my next step rather than mucking around with the calculations...

Link to comment
Share on other sites

What's really interesting is the speed difference on repeating the runs. The first time I ran the test (when none of the records were in the cache) it took over 110 seconds, or 15 times slower.

That's the problem with these tests: you have to be really careful about what exactly are you testing. Indexing and caching can be very significant factors here.

On the same note, it is my personal opinion that Anchor/Buoy (which you appear to be using) can also detract from performance - see:

http://fmforums.com/forum/topic/56132-under-the-hood-question/page__p__265221#entry265221

http://fmforums.com/forum/topic/61041-relationship-graph-valuelists-by-groups-like-scriptmaker/page__view__findpost__p__288582

Link to comment
Share on other sites

This topic is 4347 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.