Recursive Custom Function??

June 14, 201016 yr

Hi again. So, I've been told I need to use a "recursive custom function" to accomplish a task, and I'm wondering if that sounds right.

I have about 1500 records that represent individual chimpanzee urine samples. Each record has a field for "cortisol". This represents the amount of the hormone cortisol found in the sample. I am trying to establish a baseline cortisol level for each individual chimpanzee. I currently have the "urine sample" table linked to the "chimp" table via "chimpID#", and the new "baseline" field will live in the "chimp" table. Basically what I need the "baseline" field to do is the following:

1. Calculate the mean of the numbers

2. Calculate the standard deviation (SD)

3. Remove all numbers 2*SD above the mean, AND remove all number 2*SD below the mean

4. Repeat steps 1-3 until there is only 1 number left - that number is the baseline

I want to make sure a "recursive custom function" is the right way to go, before I try to figure out what the heck it is and how to do it. Also, does this seem like it would be excessively difficult to do?

Thanks!

Sholly

June 14, 201016 yr

I think it would be easier to do this with a script. How often will you need to perform this calculation? IOW, are there to be more samples that will affect the baseline?

---

BTW, could you point to some reference regarding your proposed method?

June 14, 201016 yr

Author

There will be more samples in the future, but this is the kind of thing that may only need to be re-calculated once every few years, not on the fly.

Regarding a reference for my method. I think you mean the method of calculating baseline hormone levels (but maybe you mean who told me a custom fuction would be good). The developer I've been working with a bit told me about the recursive function. As far as the method for calculating the baseline goes, I do my laboratory work at the Smithsonian and it is standard practice there. We usually cite the following pubs, but there are more.

Brown, J.L., Wasser, S.K., Wildt, D.E., Graham, L.H., 1994. Comparative aspects of

steroid hormone metabolism and ovarian activity in felids, measured

noninvasively in feces. Biol. Reprod. 51, 776–786.

Pelican, K.M., Brown, J.L., Wildt, D.E., Ottinger, M.A., Howard, J.G., 2005. Short term

suppression of follicular recruitment and spontaneous ovulation in the cat

using levonorgestrel versus a GnRH antagonist. Gen. Comp. Endocrinol. 144,

110–121.

June 14, 201016 yr

Yes, I meant your method for establishing a baseline. I am by no means an expert in statistics, so the fact that I have never heard of such method may be of no account.

Still, I am willing to venture a guess: I believe that it's very likely that the proposed algorithm will enter an endless loop when more than one sample meet the criteria of Mean ± 2*StDev, but there are no more samples to discard.

You can check this very easily by defining two summary fields (Average and StDev) and a script that finds records within the range. Run the script several times and see if you ever get to a single sample.

June 14, 201016 yr

Author

I am no expert in statistics either! I just do what I'm told by my advisors. lol.

I will try what you are suggesting on Filemaker.

I have done it on excel and it works for individuals who have larger sample pools. I can get down to one number relatively quickly. The problem arises in individuals who only have 5-10 samples or so. I often end up with 2/3 numbers remaining that are within the "acceptable range" (not +/-2SD), and then I have to go and just take the mean of those 3 numbers... and that becomes the baseline.

I'm thinking it may just be easier to do this in excel. I know that is a terrible thing to say on a Filemaker forum, but given that I am no expert, it may take me longer to figure it out than to just do it manually. Of course, when the day comes that I realize I made a mistake somewhere along the way and have to re-calculate everything by hand, I will probably wish I had done it in Filemaker.

June 14, 201016 yr

I have done it on excel and it works for individuals who have larger sample pools.

I got curious, so I tested this with a population of 1,000 samples generated randomly from a normal distribution with Mean ≈ 0 and StDev ≈ 1. It got stuck after 7 iterations with 888 samples remaining.

June 14, 201016 yr

Author

OK, you're totally right. And re-reading the references on the method, it looks like I am wrong that you need to/will ever get down to one number. The idea is obviously to remove all outliers from your baseline calculation. So, in your scenario, at 888 you would just take the mean and that would be your baseline.

It probably only seemed to work for me because I have tiny sample sizes (75 max per individual) and non-normally distributed values.

Thanks, once again, for solving a problem that I didn't even know I had yet!

June 14, 201016 yr

I think the attached file does what you ask for.

After looping through the data removing all outside the +/- 2 SD's, it takes the mean of the remaining values as the baseline.

Note that the script assumes that your data values are within (-1E10, +1E10). If your values are higher, adjust the constant values in the script to bracket your data set.

This method adds a table occurrence to the relationship graph. That is used to filter the outliers and provide the list for FileMaker's built-in standard deviation and mean value functions.

BaselineCalc.fp7.zip

Edited June 14, 201016 yr by Guest

June 14, 201016 yr

I am still a little skeptical about the method (is there an Excel function that does this?), but take a look at the attached file.

Baseline.zip

June 14, 201016 yr

Author

No, no function in excel. Just calculating the numbers, then deleting those that are out of range "by hand".

June 14, 201016 yr

No, no function in excel.

That's what bothers me. Excel has a function for practically all standard statistical methods:

http://office.microsoft.com/en-au/excel-help/statistical-functions-HP005203066.aspx

The closest to your method, I think, is TRIMMEAN() - but that works quite differently.

June 14, 201016 yr

Author

Thank you Tominator. I will try to integrate this into my database and see how it goes!

June 14, 201016 yr

Author

Hmm, the TRIMMEAN is interesting. I think it might actually have similar logic though. Again, I'm no statistics expert, but it seems that in a normal distribuition 2 SDs is the equivalent to a 95.45% radius around the mean (the confidence interval most people use). Maybe this is the logic behind the 2SD method?

June 14, 201016 yr

I really don't know. I would suggest you consult a statistician. There certainly is SOME logic in your method - the question is whether it has the meaning you are looking for.

June 18, 201016 yr

Author

Hi again.

I tried Tominator's solution (I want to be able to export into statistical software, so a dedicated field for the baseline is best) and I'm doing something wrong. It must be with my relationship, but I just can't see the problem. Would either of you guys be willing to take a look?

I have attached a simplified version of my actual file. As far as I can tell, it mirrors Tominator's solution exactly. The only difference I can see is that "value" in Tominator's solution is a number and mine is a calculation. Not sure if that makes a difference.

Thanks!

Cortisol_Baseline.zip