Newbie Needs Help "Counting" Occurences...

Dr Kyle · November 22, 2004

I have a series of ten text fields which each contain the same types of data (names of fruit fly genes). There are ~200 records in the file, all having gene names in this ten text fields. What i need to do is add a calculation field (one for each text field.. eg. ten calc fields??) which takes the gene name in each of these ten text fields and searches for how many times the same gene appears in all ten fields and IN ALL ~200 RECORDS.

I have no idea where to start with this one... I dont even know which function I would use here. Can anyone make suggestions? I can make a screen cap, show it, and elaborate on what I want to do if need be.

Thanx guys!

KYLE

dkemme · November 22, 2004

Create a calc field that is a number and use PatternCount.

Dr Kyle · November 22, 2004

I dont suppose you can give me any further info on how to use patterncount can you? Thanx

Dr Kyle · November 22, 2004

Well, I used the following calculation below...

PatternCount (Fly_Complex_Protein1; Fly_Complex_Protein1 and Fly_Complex_Protein2 and Fly_Complex_Protein3 and Fly_Complex_Protein4 and Fly_Complex_Protein5 and Fly_Complex_Protein6 and Fly_Complex_Protein7 and

Fly_Complex_Protein8 and

Fly_Complex_Protein9 and

Fly_Complex_Protein10)

This calc works ok... adding up how many times the genename in the Fly_Complex_Protein1 field occurs across all ten Fly_Complex_Protein fields in that one record, but it does not go across multiple (eg. all) records.

How do I add that part on? Actually, upon looking again, for some records it comes back with the result as zero (0)... when in fact the lowest you could have is 1 (the one you are using to search with means there is at least one occurence of it).

Any ideas what I am doing wrong?

Thank you guys so much!

KYLE

-Queue- · November 22, 2004

field1 and field2 and field3 and ... and field10 returns either zero or 1, depending on whether all fields have a nonzero value.

Instead try

1 + PatternCount( Fly_Complex_Protein1; Fly_Complex_Protein2 ) + PatternCount( Fly_Complex_Protein1; Fly_Complex_Protein3 ) + PatternCount( Fly_Complex_Protein1; Fly_Complex_Protein4 ) + ... + PatternCount( Fly_Complex_Protein1; Fly_Complex_Protein10 )

or

1 + PatternCount(

Dr Kyle · November 22, 2004

Ok, I get it I think. I will try that out and see how far I get before I need more help on the summaryfield part.

By the way, what does the paragraph symbol in the second calc you show mean/do?

As I said in the title, I am pretty much a newbie to everything besides table relationships and using them for lookups.

KYLE

Dr Kyle · November 22, 2004

Ok, now I have it adding up the number of times each gene occurs IN THAT RECORD, but how do I go about having it report the number of times it finds that gene across all ten fields in all ~200 records total?

Some kind of Summary calc? sorry I am so dumb here.

-Queue- · November 22, 2004

-Queue- · November 22, 2004

Read the final paragraph in my original post.

Dr Kyle · November 22, 2004

Oh, a carriage return... I gotcha on the use of that symbol then. In my case it wont really matter, as the search terms are always specific enough that only "completely matching" answers get counted.

Thank you for the explaination though!

KYLE

Dr Kyle · November 22, 2004

(summaryfieldX; Fly_Complex_ProteinX)

So I create a new number field called "Protein1_Occurence_Summary" and make it a calculation which = what?

Sorry, I just dont get the lingo in your description above very well. I am dense.

-Queue- · November 22, 2004

Make it a summary field, not a calculation, that is a 'total of' the relevant calculation field.

Dr Kyle · November 22, 2004

Oh, ok.. I will try that.

KYLE

Dr Kyle · November 22, 2004

I dont know what I did wrong, but the summary field isnt working right. And when I checked out what the first "patternCount" field did, it is still only looking IN THAT SAME RECORD, NOT ACROSS ALL RECORDS AND ALL TEN FIELDS. Man, I must be really dense here...

I think if the "PatternCount" was doing what I needed, then I wouldnt need the summary.

Essentially, to restate the problem... I have 223 records in a file, all of which have ten text fields named Protein1, Protein2... Protein10. Each position in every record is either entirely blank, or has a gene name in it (the letters CG followed by several numbers). In each each, I want to create a new field corresponding to each of the ten gene fields which checks how many times the gene in say the Protein1 field occurs not only in the other nine ProteinX fields of THAT RECORD, but in all TEN ProteinX fields of the other 222 records. Essentially, I want to know how many times each and every gene occurs in the file, regardless of which position it appears in (Protein1-10). Thus, when I am looking at Record 134 for example, I want to see the gene names in the PrteinX fields, and how many times that gene occurs in the entire file (223 records) next to it in the "ProteinX_count" field. I have attached my work thus far, so you can look at it if you choose.

Sorry i am a novice here guys.

KYLE

FlyWork.zip

-Queue- · November 23, 2004

Any function will refer to only the current record, unless forced to do so otherwise. There is no easy way to compare all records simultaneously. That is what the summary fields are for, to total all occurrences.

However, you have explained a different problem in this post as opposed to your first, IMO. To accomplish this, it would be much easier to use a related file and one record for each protein, with up to 10 proteins (related records) per parent record. It wouldn't matter which data was in which record because it would all be in the same related field. You could create a value list based on the field and some tricky repeating calculations to summarize the data.

Using the same idea on a single file poses problems because a piece of data may be related to three fields in a particular record, but would only appear once if you tested the relationship. I think you may need to hard-code globals or calculations with each possible result and create a relationship from each result to all ten fields. Not very pretty nor pleasant.

Let me think about it for a while. Or perhaps someone (ahem, Ugo) will have a good solution in the meantime.

Dr Kyle · November 23, 2004

Unfortunately, the original Filemaker file is being generated by others, and involves ever changing gene names... so I dont think the option you suggest will work in my case.

I really appreciate all your help on this. Maybe it just cant be done the way I want it to be.

Oh well I guess...

Kyle

I am done screwing with it for tonight at least.

-Queue- · November 23, 2004

Okay, here's the best method I've found so far.

Create a calculation text field combinedProteinAll with ten repetitions, equal to

Let([

R = Get(CalculationRepetitionNumber);

E = Evaluate( "Extend(GetField("Protein" & " & R & "))" ; [Protein1; Protein2; Protein3; Protein4; Protein5; Protein6; Protein7; Protein8; Protein9; Protein10] )];

Case( not IsEmpty(E); E & "_" & R & "_" & Extend(Record_Number) )

)

and a value list AllProteins based on this field.

Then create a repeating number field repTotal with 10 repetitions and put it on the List View screen formatted to display all repetitions, and a script Update repTotal that does the following:

Freeze Window

Go to Layout [List View]

Go To Field [Gavin&Ho_Yeast_Complexes_To_Fly::repTotal]

Loop

Set Field [Let( F = GetField("Protein" & Get(ActiveRepetitionNumber));

Case( IsEmpty(F); ""; PatternCount(

Dr Kyle · November 23, 2004

I will work my way through your last post and see what I come up with. I have to admit though, on first read threw I dont even understand 90% of your post. My fault, not yours. I really appreciate your time and effort on this. You are a wiz!

I have another, seperate, likely MUCH EASIER question to ask about relationships, which I will post in a new thread though...

Kyle

-Queue- · November 23, 2004

The combinedProteinAll calculation is used for the value list. A value list will consist of only unique values. The addition of the current repetition number and the record number allows for all proteins on all records to be seen as unique, so that the value list will not ignore what it would consider duplicates. If you put the calc on a layout and format it to display all repetitions, you'll see that it basically puts Protein1 in the first rep, Protein2 in the second rep, etc. for each record, with _repetition_record number at the end of each one.

I used Evaluate in this instance because a field must be indexable to be used for a value list. Normally, the use of GetField requires an unstored calculation so that it refreshes constantly. With the addition of Evaluate, you can specify it to refresh only when one of the listed fields is changed (Protein1 through Protein10, enclosed by brackets) and still allow it to be indexed.

The text inside of Evaluate merely creates the calculation to be performed when one of the Protein fields is changed. If you put a text calculation of

Let( R = Get(CalculationRepetitionNumber); "Extend(GetField("Protein" & " & R & "))" )

on a layout, the result would be Extend(GetField("Protein" & Rx)), where Rx is between 1 and 10, depending on the current repetition. When this expression is evaluated, it will grab ProteinX and Extend it to the current repetition. Extend is required when using a normal field with a repeating field. It basically 'extends' the value of the normal field so that it treats it as a repeating field, with the value being the same in all repetitions.

Once we have a value list based on this calculated repeating field, the script uses a similar idea to count how many times the expression

Sign In

Newbie Needs Help "Counting" Occurences...

Recommended Posts

Dr Kyle

dkemme

Dr Kyle

Dr Kyle

-Queue-

Dr Kyle

Dr Kyle

-Queue-

-Queue-

Dr Kyle

Dr Kyle

-Queue-

Dr Kyle

Dr Kyle

-Queue-

Dr Kyle

-Queue-

Dr Kyle

-Queue-

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Explore

Affiliate Forums

Activity

Important Information