Jump to content
Server Maintenance This Week. ×

Recursive Custom Function??


This topic is 5074 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi again. So, I've been told I need to use a "recursive custom function" to accomplish a task, and I'm wondering if that sounds right.

I have about 1500 records that represent individual chimpanzee urine samples. Each record has a field for "cortisol". This represents the amount of the hormone cortisol found in the sample. I am trying to establish a baseline cortisol level for each individual chimpanzee. I currently have the "urine sample" table linked to the "chimp" table via "chimpID#", and the new "baseline" field will live in the "chimp" table. Basically what I need the "baseline" field to do is the following:

1. Calculate the mean of the numbers

2. Calculate the standard deviation (SD)

3. Remove all numbers 2*SD above the mean, AND remove all number 2*SD below the mean

4. Repeat steps 1-3 until there is only 1 number left - that number is the baseline

I want to make sure a "recursive custom function" is the right way to go, before I try to figure out what the heck it is and how to do it. :) Also, does this seem like it would be excessively difficult to do?

Thanks!

Sholly

Link to comment
Share on other sites

I think it would be easier to do this with a script. How often will you need to perform this calculation? IOW, are there to be more samples that will affect the baseline?

---

BTW, could you point to some reference regarding your proposed method?

Link to comment
Share on other sites

There will be more samples in the future, but this is the kind of thing that may only need to be re-calculated once every few years, not on the fly.

Regarding a reference for my method. I think you mean the method of calculating baseline hormone levels (but maybe you mean who told me a custom fuction would be good). The developer I've been working with a bit told me about the recursive function. As far as the method for calculating the baseline goes, I do my laboratory work at the Smithsonian and it is standard practice there. We usually cite the following pubs, but there are more.

Brown, J.L., Wasser, S.K., Wildt, D.E., Graham, L.H., 1994. Comparative aspects of

steroid hormone metabolism and ovarian activity in felids, measured

noninvasively in feces. Biol. Reprod. 51, 776–786.

Pelican, K.M., Brown, J.L., Wildt, D.E., Ottinger, M.A., Howard, J.G., 2005. Short term

suppression of follicular recruitment and spontaneous ovulation in the cat

using levonorgestrel versus a GnRH antagonist. Gen. Comp. Endocrinol. 144,

110–121.

Link to comment
Share on other sites

Yes, I meant your method for establishing a baseline. I am by no means an expert in statistics, so the fact that I have never heard of such method may be of no account.

Still, I am willing to venture a guess: I believe that it's very likely that the proposed algorithm will enter an endless loop when more than one sample meet the criteria of Mean ± 2*StDev, but there are no more samples to discard.

You can check this very easily by defining two summary fields (Average and StDev) and a script that finds records within the range. Run the script several times and see if you ever get to a single sample.

Link to comment
Share on other sites

I am no expert in statistics either! I just do what I'm told by my advisors. lol.

I will try what you are suggesting on Filemaker.

I have done it on excel and it works for individuals who have larger sample pools. I can get down to one number relatively quickly. The problem arises in individuals who only have 5-10 samples or so. I often end up with 2/3 numbers remaining that are within the "acceptable range" (not +/-2SD), and then I have to go and just take the mean of those 3 numbers... and that becomes the baseline.

I'm thinking it may just be easier to do this in excel. I know that is a terrible thing to say on a Filemaker forum, but given that I am no expert, it may take me longer to figure it out than to just do it manually. Of course, when the day comes that I realize I made a mistake somewhere along the way and have to re-calculate everything by hand, I will probably wish I had done it in Filemaker.

Link to comment
Share on other sites

I have done it on excel and it works for individuals who have larger sample pools.

I got curious, so I tested this with a population of 1,000 samples generated randomly from a normal distribution with Mean ≈ 0 and StDev ≈ 1. It got stuck after 7 iterations with 888 samples remaining.

Link to comment
Share on other sites

OK, you're totally right. And re-reading the references on the method, it looks like I am wrong that you need to/will ever get down to one number. The idea is obviously to remove all outliers from your baseline calculation. So, in your scenario, at 888 you would just take the mean and that would be your baseline.

It probably only seemed to work for me because I have tiny sample sizes (75 max per individual) and non-normally distributed values.

Thanks, once again, for solving a problem that I didn't even know I had yet!

Link to comment
Share on other sites

I think the attached file does what you ask for.

After looping through the data removing all outside the +/- 2 SD's, it takes the mean of the remaining values as the baseline.

Note that the script assumes that your data values are within (-1E10, +1E10). If your values are higher, adjust the constant values in the script to bracket your data set.

This method adds a table occurrence to the relationship graph. That is used to filter the outliers and provide the list for FileMaker's built-in standard deviation and mean value functions.

BaselineCalc.fp7.zip

Edited by Guest
Link to comment
Share on other sites

No, no function in excel.

That's what bothers me. Excel has a function for practically all standard statistical methods:

http://office.microsoft.com/en-au/excel-help/statistical-functions-HP005203066.aspx

The closest to your method, I think, is TRIMMEAN() - but that works quite differently.

Link to comment
Share on other sites

Hmm, the TRIMMEAN is interesting. I think it might actually have similar logic though. Again, I'm no statistics expert, but it seems that in a normal distribuition 2 SDs is the equivalent to a 95.45% radius around the mean (the confidence interval most people use). Maybe this is the logic behind the 2SD method?

Link to comment
Share on other sites

Hi again.

I tried Tominator's solution (I want to be able to export into statistical software, so a dedicated field for the baseline is best) and I'm doing something wrong. It must be with my relationship, but I just can't see the problem. Would either of you guys be willing to take a look?

I have attached a simplified version of my actual file. As far as I can tell, it mirrors Tominator's solution exactly. The only difference I can see is that "value" in Tominator's solution is a number and mine is a calculation. Not sure if that makes a difference.

Thanks!

Cortisol_Baseline.zip

Link to comment
Share on other sites

This topic is 5074 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.