Jump to content

This topic is 7823 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Posted

Let's say I'm working with lots of dates, and for various reasons, I need to use lots of different manipulations of those dates: Week, Month, Next Week, Last Week, etc.

I have two choices:

This_Week = WeekofYear(Date)

Next_Week = (WeekofYear(Date) + 1)

or

Next_Week = (This_Week + 1)

In other words, referring to existing calcs or starting fresh. For a big database with lots of files, which is the more efficient approach to take?

Posted

Honestly, I don't know which is more efficient. If you haven't done so already, I would test both methods, perhaps using a looping script to reference the data in each type. My guess would be that the first method would be faster, since I'm assuming that these fields are unstored and evaluating the second one would take a bit longer because it has to evaluate two unstored fields.

Chuck

Posted

Hello McCormick,

In many cases, including the example you have cited, using a single calculation is more efficient if taken in isolation.

To illustrate this, in your example, with the single calc, the process for arriving at Next_Week involves four steps, viz:

1. Retrieve the value for date

2. Resolve the WeekofYear( ) function

3. Add 1

4. Write the result out to cache/disk

- ie four calls to the CPU (not necessarily four CPU cycles though, depending on the system and hardware architecture you're running on).

whereas the two-calc version requires:

1. Retrieve the value for date

2. Resolve the WeekofYear( ) function

3. Write the result of calc 1 out to cache/disk

and then

4. Retrieve the value for calc1

5. add 1 to it

6. Write the result out to cache/disk

Now the plot thickens. If, as would seem to be the case, you are already having to calculate This_Week in its own right, then there will be an overall saving of one step by referencing This_Week within the expression for Next_Week, thus reducing it from four steps to three.

But while on the face of it this seems more efficient (albeit by only one step in this instance), it may not be so in practice. This is because many contemporary operating systems (and hardware platforms) provide the capacity to process more than one instruction simultaneously within each CPU cycle. And whereas the six step procedure outlined above is one step less than the seven steps that would be required to calculate the two results independently, the second three steps in the six step sequence are dependent on the result of the first three and therefore cannot commence until the first three have concluded - thus forfeiting any system architecture advantages from simultaneous instruction handling.

So overall, in the case of this specific example, calculating the two results independently would be likely to be more efficient on most if not all system/hardware platforms.

However in a case where the calc for This_Week was a much more convoluted one, (eg involving 50 steps rather than three) the balance would tip in the other direction, as there would be a lot more steps to be saved by referencing This_Week within the expression for Next_Week.

Irrespective of this, I would caution against putting in place long chains of calcs each of which depends on the preceding calc because each must then be processed in turn before the final result can be returned. Try to keep the tiers of the table of dependencies down to two or three - solutions which have dozens of layers of dependency invariably struggle.

This topic is 7823 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.