Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×
The Claris Museum: The Vault of FileMaker Antiquities at Claris Engage 2025! ×

Filter for unique records with 50k+ records?


This topic is 5326 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Posted

I have a table relationship for a school system:

Campus -> Class-> Enrollment (a join table including date, classID, studentID) -> Students

I'm trying to calculate for a given time-range, the number of students who were enrolled by campus. I don't want to double-count students, so I need the # of unique students.

I would like to run this report as calculations from within the campus table. This can be done by setting up a predicate relationship (with a start and end date) between the Campus and Student table. Then simply use Count(Student::ID) to get the # of students. Works very nicely.

However -- there's a case where I need to do this report using a classification (part time vs. full time) that exists in the Enrollment Join table.

This is where I'm stuck -- I can no longer just use Count(Student::ID) since ignores the PartTime variable in the join table.

If I use Count(Enrollment::PartTime) then unfortuantely I get duplicates (since a student may be enrolled in 2+ classes for 2+ months).

I've tried using a custom function which filters a list for unique values, but the problem is that I have 100k+ items and it seems the FM limit is about 50K for recursive functions. And even when I'm only using 1k to 10k records, this functions is slowwwww..

Possible solutions:

* run the report from the Students table -- while this would work, my entire reporting system is set up to run from the Campus table, so I'd like to avoid this if possible.

* is there a custom function to filter a list for unique values that uses tail-recursion and can handle 50k+ items?

* value lists can filter for unique values -- how do these compare performance-wise to recursive list functions?

* write a script to do this :

for each campus

perform find (students enrolled between startDate, endDate)

go to related records (students)

$n = get (FoundCount)

store $n in report table

etc.

loop

* I'm on FMSA9 right now, but can move to FM 11 if that would help.

Ideas?

Posted

I'm trying to calculate for a given time-range, the number of students who were enrolled by campus.

...

This can be done by setting up a predicate relationship (with a start and end date) between the Campus and Student table.

I don't see how this can work, when the Student table does not have a campus nor a date of enrollment field.

However, if you put your global filter fields in the Class table, you can do both.

Posted

When I started thinking about the performance problems with my unique custom function, I realized the algorithm I was using (http://www.briandunning.com/cf/596 ) is very slow, as it uses RightValues and FilterValues.

So, I wrote highly optimized versions which seem to help a bit:

http://fmforums.com/forum/showtopic.php?fid/35/tid/215388

I think this doesn't solve the 50,000 item limit, but it does seem quite a bit faster.

Posted

I don't see how this can work, when the Student table does not have a campus nor a date of enrollment field.

However, if you put your global filter fields in the Class table, you can do both.

Sorry, I left out some of the key fields: every record in the Join table includes StudentID, Date, ClassID, Campus, etc.

Posted

But you said the relationship was "between the Campus and Student table". I also don't see why Campus should be in the join table.

Perhaps I am missing something, but it seems very simple to me. Given:

Campus -< Class -< Enrollment >- Students

filter the relationship between Class and Enrollment, and count Students::StudentID.

Posted

I'm over-simplifying this for sake of discussion, so I may have over-simplifed out the actual hard part :

Counting studentIDs is easy -- the problem is more complex in that I have a bunch of "kinds" of students (I had said "part time vs. full time" but actually it's about 14 different categories). That category value is only stored in the Enrollment Table.

In past reports, I simply created a bunch of unstored report calcs in the Enrollment table ( cStudentIsTypeAlpha, cStudentIsTypeBeta etc.) Each of these would be either nil or contain the # of credit hours.

Then, I had a bunch of calcs in the Campus table which would either Sum() or Count() these related values in the Enrollment table.

I did it this way as then I can have a single relationship and put most of the logic in the calculations (since it's very hard to have a relationship based on a "OR" predicate, such as "Student age < 25 OR Student = disabled". Making 14(or so) relationships to capture these values seemed overly complicated at the time.

Posted

If it's too difficult to filter the relationship, produce the report from the Enrollment table after performing a find. To get the count of unique students for each campus, see:

http://fmforums.com/forum/showtopic.php?tid/196904/

Posted

That technique described (calculating # of unique rows by of having a calc field equal to 1 divided by a summary count field, and then adding those up) is pretty neat trick.

Question: I wonder if this will this work through a relationship?

Posted

As I stated at the top, my current system is running reports from the Campus table, so much of my motivation is "tradition". The big report is actually a cross-tabs built from within the campus table -- cross tabs are never easy to do in filemaker, so it's a bit of a slog to get them working. Once it was working however I decided I liked running the reports from the campus table, so I'm motivated to find a way to keep this design.

I've updated http://fmforums.com/forum/showtopic.php?fid/35/tid/215388 with a new, faster uniqueListCount function which I think is going to do what I need -- it seems to be able to handle 500,000 record summaries that I need in semi-decent time.

Posted

I am afraid you have lost me completely. This started with running the report from the Campus table - but you said it's difficult to define the filtering relationship to the Enrollment table.

If you CAN define such relationship, you can get the count of unique students very easily by counting the StudentID in the Student table.

Posted

As I stated at the top, my current system is running reports from the Campus table, so much of my motivation is "tradition". The big report is actually a cross-tabs built from within the campus table -- cross tabs are never easy to do in filemaker, so it's a bit of a slog to get them working. Once it was working however I decided I liked running the reports from the campus table, so I'm motivated to find a way to keep this design.

I've updated http://fmforums.com/forum/showtopic.php?fid/35/tid/215388 with a new, faster uniqueListCount function which I think is going to do what I need -- it seems to be able to handle 500,000 record summaries that I need in semi-decent time.

50,000 or 500,000?

And note - it does not work correctly as written. See my comments there.

Posted

I am afraid you have lost me completely. This started with running the report from the Campus table - but you said it's difficult to define the filtering relationship to the Enrollment table. If you CAN define such relationship, you can get the count of unique students very easily by counting the StudentID in the Student table.

Basically, I'm trying to do this:

Select Count(Distinct StudentID) from Enrollment where (Enrollment.ClassType =1 or Enrollment.ClassType = 4) and Enrollment.date > StartDate and Enrollment.date <= EndDate

Repeat this for about two dozen calculations and put the result in a cross-tab table by campus.

FileMaker's lack of Distinct/Unique calcs make this difficult.

Also, since the predicate is an OR relationship, you can't do it using built-in FM relationship predicates. Which means you have to do a calc field in the join table. And this calc field needs to be stored & indexed. (I think, right?)

This join file is already super-humongous, so even if I could make the calcs stored & indexed I'd prefer not to (since these are calcs used only for reporting).

Thus, my reasons for wanting the Unique() function.

Posted

since the predicate is an OR relationship, you can't do it using built-in FM relationship predicates.

Of course you can. Enter "1¶4" in the global ClassType field and that's it.

Posted

True, but remember this is a cross-tabs -- your technique would work for one such calculation, but if you have a dozen such OR calcs, not so much.

Posted

Oops -- I spoke too soon -- it does seem that now I am hitting either the 10,000 stack or 50,000 iteration maximum in at least some of the calcs, so my declaration of Victory is a bit premature.

The other issue with the calc having a bug has been fixed as per your suggestion, thx!

Posted (edited)

I don't see why not, but then I already said I don't really understand the issue. You gave one example and I addressed that. You assume I can understand the "dozen" from the single example - I am afraid that's an overestimation of my abilities.

---

P.S. Why does this thread also discuss your other one?

http://www.fmforums.com/forum/showtopic.php?tid/215388/

Edited by Guest
Posted

I don't see why not, but then I already said I don't really understand the issue. You gave one example and I addressed that. You assume I can understand the "dozen" from the single example - I am afraid that's an overestimation of my abilities.

I'm probably not explaining this well -- but please also remember from the first post, I did say I'm looking for a particular solution that fits in the existing design. You've made excellent suggestions, they just aren't "right" when compared to my existing design schema.

P.S. Why does this thread also discuss your other one?

http://www.fmforums.com/forum/showtopic.php?tid/215388/

That thread is about a custom function to solve a particular problem which may be useful to others, whereas this thread is more about the general problem of multi-prediacate relationships & reporting. If someone wants to merge the threads that's fine, but I think they are separate enough to stand alone. No big deal either way.

Posted

Well, I don't know. I'd say if the existing schema doesn't do the job ...

I don't think the threads should be merged - but neither should posts discussing your custom function appear here.

This topic is 5326 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.