Filtered Relations

LaRetta · June 12, 2009

We daily run automatic email notices. I have a table called ProjectSchedules (ProjSch). It contains one StartDate and, if it is Substantial Completion date, there is the unique ID (CapReq) in the SubstComplID field. If it is Construction Start date then the unique CapReqID is in the ConstStartID field.

I have a table of email notices. If Substantial Completion date falls within the date specified (today plus the global), then I need to send an email. I can ignore it if an once an email for that purpose and that ID has been sent. Substantial Completion is called eCode H and Construction Start is called eCode G. So if CapReq 1 has already been sent an eCode H, it should NOT appear in the Substantial Completion portal.

I could perform find but both tables are large and it seemed easier to look at the portal daily and send any related records. And that is similar to how the other auto emails are handled anyway.

Note that I have the portal on the eMail table but I don’t need to know which children are related to each email because the email table contains many emails which AREN’T eCode G or H. I just need one portal with all Substantial Complete which haven't been set and are within the timeframe and same with Construction Start. I cannot change the ProjSch table nor do I want to add a field to it (I do not want to mark the ProjSch records).

The process feels a bit clunky and I would appreciate ideas on improving it (please view attached file). Thank you in advance for any suggestions.

Schedules.zip

comment · June 12, 2009

Don't you need to CREATE the email records? I don't quite see how the portal helps in that. I didn't go too deep into this, but I think I would find the relevant ProjectSchedules records and import them into the eMails table - with a concatenated field preventing duplicates through validation.

LaRetta · June 12, 2009

Don't you need to CREATE the email records? I don't quite see how the portal helps in that.

Yes, I need to create the emails. And the portal IS the find. I only display the portal to show you how I want to restrict the record-set. I feel it is faster to use a relationship than perform a find, since the number of records in both tables are huge and will continue to grow.

So I will GTRR to this 'portal set' and then import them. With the proper relationship in place, the portal can be checked many times a day to see if there are any children to send. A check for If [ portal::ProjSch ] is much quicker than performing a full find each time, no? And it can be checked several times a day. That's why I prefer using a relationship.

Edited June 12, 2009 by Guest

LaRetta · June 12, 2009

I have other relationships which will have records pop into them (based upon dates and other criteria) for sending automatic emails. Sometimes one change in a date can trigger sending 25 emails. The emails are sent in batches controlled and stored by system.

comment · June 12, 2009

I am saying this with hesitation because as I said, I didn't look too deeply into this. I don't think a relationship has a significant advantage over a find - they both look into the index for matches. And since the process must be scripted anyway(?), why not put as much of the logic as possible into the script (and eliminate the relationship altogether)?

The one exception I would make is the prevention of duplicates, because that couldn't be done by the script alone anyway - at least not simply.

LaRetta · June 13, 2009

Duplicates ARE eliminated in this process (because they are filtered out of the relationship). And for once I must disagree about the find vs. a relationship.

One of these relationships (a portal) may not have a record in it for weeks and then suddenly have several in one day. But the process must check (sometimes several times a day) to see if there are any records to send. I have script which checks each portal and only runs if it has related records.

Compare:

1) Look at the relationship and if child records fire script to GTRR then import them vs.

2) Peform find on child table when 90% of the time you will find no related records AND when the find is based upon whether prior records have already been generated for those records in the eMail table.

How could If [ Child::ID } and proceed only if there are records NOT be faster than performing the complete script only to discover there aren't any? We have robot which runs throughout the day checking for new actions which must be fired so script runs anyway.

Edited June 13, 2009 by Guest

comment · June 13, 2009

Let's simplify the situation a bit:

Target::gDate = Source::Date

When you do a GTTR, it needs to find the matching records, i.e. records in Source whose Date matches gDate in Target. How does it do that?

Ostensibly, if the cache is current, a GTRR could be faster than a find - assuming a GTRR will use the cache (something I am not sure of). Even so, if the Source table is being continually updated, then so must be the cache. So it's constantly consuming resources, and all for what? For that "several times a day" moment when the actual action needs to be performed.

LaRetta · June 13, 2009

Even so, if the Source table is being continually updated, then so must be the cache.

I envisioned the GTRR being somewhat equal to the find in resources and time. But since 99.99% of the time, the find would produce no results in many of these portals, then simply checking a relationship first would be best.

I realize table occurrences are loaded in the table of dependencies at start-up. But I thought, if they weren't displayed, they wouldn't be used (reindexed or recalculated) until checked by my script. But the ProjSched table's cache must constantly update anyway, even if not displayed or needed at all - simply because there is a table occurrence of it in another file?

Also, this is separate file. So if I use extra resources, is it in the email file or does it still affect the source file I am referencing (might this affect separation model issues as well)?

I appreciate you walking through this with me.

Update: I removed script comparisons; I think we both are aware of that comparison and I wanted to cut down on fluff.

Edited June 13, 2009 by Guest
Added update

comment · June 13, 2009

I think it comes down to whether:

If [ not IsEmpty ( Related::Field ) ]

is significantly more efficient than performing a find in the Related table. I don't know the answer for sure. My theory is that the answer is no, but it's only my theory. I think this would be a good question to take up with FMI at DevCon.

However, even if the answer turns out be yes, you still need to consider the balance here: you've built two multi-predicate relationships just to avoid the find - which 99.99% of the time would produce no results. But 99.99% of what? Suppose the find runs every half-hour during business hours. That's only 16 finds per day.

I would certainly use the relationships for the test, if they were already there. But I don't think I would build them, if that's their only purpose.

LaRetta · June 13, 2009

I am quite surprised by all of this, Michael. I never would have guessed (obviously, smile) that a find through a large data-set might be as efficient as simple test for related (all resources considered).

Unfortunately, we have no meter we can attach which spikes an efficiency graph when we make schema changes; I wish there was. And I can think of no speed tests that could be performed but that doesn't mean I won't try. I guess you are right, only FM Engineers might know the answers.

As for my current situation? I can't change what is in place. But we are building our next release which will take full advantage of vs. 10 so script triggers should eliminate this specific issue.

LaRetta · June 13, 2009

So yes, I'm going to try a find instead of building a relationship in this instance!

So can you help me with a find then (just on that first relationship SchedCompl eCode H? I still don't see a find as possible (and certainly not as easy) in this situation (as a relationship) but that is probably because my Find techniques have slipped from lack of use (preferring relationships as filters in many instances such as this one)

Any emails with eCode H, do NOT find those CapReqIDs. Go to ProjSched and search the DateStart field only if the SubstComplID has something in it. And that SubstComplID field contains the CapReqID.

The date range to find is Get ( CurrentDate ) & ".." eMails::cSubstantCompleteNoticeDate.

Because, unless I create a List() within eMail which holds those CapReqIDs with eCode H, I'll have to first perform a find in eMails for them then put them into a list, then use that list to omit them from my found set in ProjSched.

Changing one's ways isn't always easy but it is always exciting. It WOULD be really nice to ditch all those calcs in eMail (which are used just to isolate the eCode H so I can create a List() of those CapReqIDs!!

UPDATE: I could handle the find but I am convinced your find technique would be much better (I've seen a lot of your work). Where do I draw the line? Do I create an eMail_All table for cartesian join like I have to use to generate the List()? or do I find in the HUGE eMail table? Then I'll need to create multiple (omit) requests for each CapReqID in the list and ...

Does it sound like I'm just being lazy? I'm not. But if I'm changing from filtered relationships (for advanced finds) to actual finds, I'd like to begin picking up the best techniques for replacing those pieces.

Edited June 13, 2009 by Guest
Added Update

LaRetta · June 13, 2009

Um, I'm going to try myself - I don't want you to think I'm a lazy dog. I just can't import using validation unique because there are other CapReqID without an eCode which can be duplicates. And there are other CapReqID's with other eCodes which can be duplicates. In this instance, there should only be ONE CapReqID with eCode H allowed. But I tried using this technique before and couldn't find how to allow unique sometimes and not other; therefore, I felt I'd have to use a list of those eCode H already in email and omit them during the find itself.

And I know how to include multiple omits; I just was very curious as to how you would approach it; whether you might use a hybrid (and use a relationship for part of it). We'll see what I come up with but I'm convinced yours would be much prettier. :wink2:

NB: I didn't realize how much I've depended upon filtered relationships instead of finds. I need to rethink many of my solutions because I find myself shaking with withdrawals just THINKING of giving up filtered portals (whose only purpose was to perform a find). :crazy2:

Edited June 13, 2009 by Guest
Added sentence

LaRetta · June 14, 2009

I just can't do this (use a find alone) because there will be thousands of existing CapReqID with eCode H emails. That will mean thousands of Omit Requests. I suppose I can flag the records in ProjSched but I've been working hard at keeping this eMail file a separate module and leave no footprint on the main solution (data side). And marking should NOT be necessary anyway - that's the purpose of relationships!

Filtering down the relationship to <> CapReqIDs with eCode H will help a lot but then, that's the majority of what my attached demo is made of. Maybe a GTRR on matching CapReq then Show Omitted and Constrain by date ... but that is same as my file except I include the date range in the relationship as well.

I honestly don't know what else to try. But I still believe you on your comparisons. It's just that, when the record numbers are very large and the filtering is complex, I wonder if there is a line which gets crossed on which is best to use.

Edited June 14, 2009 by Guest
Added paragraph

comment · June 14, 2009

I am trying to find a simple statement - in human language - of the real requirements here:

Suppose I go into the ProjSched table, and find records within the required date range. Now what? Do I understand correctly I need to eliminate records with the same CapReqID and of the same type*, that already exist in the eMails table - and import the rest?

---

(*) by type I mean in which of the 2 fields the CapReqID is replicated.

LaRetta · June 14, 2009

Suppose I go into the ProjSched table, and find records within the required date range.

You cannot start by finding the date range. The records in Project Schedule cover MANY different actions (substantial completion is only one of them). First you must look to the SubstCompID field. The CapReqID is stored here, telling us that the date record is the start and end dates of the Substantial Completion phase of the schedule.

So you will search for SubstComID = * and the date range.

Now what? Do I understand correctly I need to eliminate records with the same CapReqID and of the same type*, that already exist in the eMails table - and import the rest?

If, within your found set of Project Schedule records, you have a CapReqID 44 … AND, if eMail has a CapReqID 44 AND the eCode type in email is H then it should be omitted. eCode H in email is the email template. It means that a standardized auto email is sent to staff telling them that the project is going to enter the Substantial Completion phase in 14 days. The email table holds many other types of eCodes assigning staff to projects, telling them a project is closed, telling them to contact Vendors etc. None of the other eMail codes are considered in *the type.

(*) by type I mean in which of the 2 fields the CapReqID is replicated.

If CapReqID 44 and eCode H exists in email, it should be eliminated from your found set. But if CapReqID 44 and eCode (blank) exists in email then CapReqID 44 should NOT be eliminated from your found set. And if CapReqID 44 and eCode C exists in email, it should NOT be eliminated from your found set … we should only eliminate the CapReqID if it has already been sent an eCode H.

The SubstCompl portal in my file shows exactly what should be imported – the filter eliminates what shouldn’t import. So a CapReq should only get ONE eCode H action, ever and only if its CapReqID exists in the SubstCompID field.

I really tried to use human language and keep it as clear as possible. :king:

comment · June 14, 2009

OK, let me try something simple and hopefully this can be extended to meet all your requirements.

Go to ProjSched and search for SubstComID = * and the date range.

If you find any, import them using this mapping:

ProjSched::CapReqID -> eMails::CapReqID

ProjSched::SubstComID -> eMails::SubstComID

Of course, before you can do this, you must have a SubstComID field in eMails, and you must set its validation to Unique, Validate always.

---

BTW, I really don't like the way records are tagged in ProjSched. I know you said you cannot change it, but it needs to be said. There should be an ID, a date and ONE type field.

Edited June 14, 2009 by Guest

LaRetta · June 15, 2009

ProjSched::CapReqID -> eMails::CapReqID
ProjSched::SubstComID -> eMails::SubstComID

OMG! Even though both fields will hold the CapReqID, the one in SubstComID field will be what makes it eCode H and can therefore be validated as unique! I was trying to validate on eCode H which I didn't have!

So I don't need to perform a find for those CapReqIDs in eMail nor do I need to create a List() calculation (and table occurrence) of them! This is very simple and it reinforces your suggestion that using a find can be as simple (and even more efficient) than using a multi-predicate relationship!

Thank you so very much for hanging in there with me, I realize it probably wasn't fun. The solution I was grasping for was a bit funky but the underlying principle, which became the TRUE GEM of this thread, far outweighed it!! I think I'm not the only Developer who might be a bit too quick on the 'create relationship' trigger just for finds; I read of it happening every day.

BTW, I won't judge the Project Schedule - that is a complex scheduling module where the requirements had been changed repeatedly on the Developer. But I know that it IS a piece which is planned for complete re-write in our new release.

I always appreciate you saying what you feel should be said. Three cheers for a single find and two-less table occurrences!!! The eMail graph breathes a bit easier already! BTW, yes, this meets ALL my requirements!!

:yourock:

Edited June 15, 2009 by Guest
Added last sentence

comment · June 15, 2009

Just keep in mind it's a balance: You get 0 relationships, no complex calcs to build the keys, and all the logic in the script (except the validation part). OTOH, a major part of the load falls on the validation, and it's worth watching how well it performs as the number of eMails increases.

I believe in THIS case the balance will favor the find - due to the complexity of the relationships required otherwise. But I would still like to hear the opinion of a qualified FMI engineer re the general question. i.e. building relationships for the sole purpose of finding records (or checking the existence of).

Sign In

Filtered Relations

Recommended Posts

LaRetta

comment

LaRetta

LaRetta

comment

LaRetta

comment

LaRetta

comment

LaRetta

LaRetta

LaRetta

LaRetta

comment

LaRetta

comment

LaRetta

comment

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information