Math between related records

TerenceM · November 18, 2004

I hope there's a solution for this problem. I would like to be able to show the result of a calculation using information from the current record and each related record appearing in a portal. The problem is its a many-to-many relationship. The problem is biology (genomics) related:

Two tables (sample data)

INSERTS

name

location (1000, 2000, 3000)

GENES

gene

start/end (500/1500, 700/2500)

relationship

location >= start and location <= end

(this isn't exactly how I do the relationship since non-identity relationships are astronomically slow, but the net effect is the same)

So the first record in INSERTS is related to both records in genes, and the second record in GENES in related to the first two records in INSERTS.

On a layout for INSERTS, I want to have a list of all the matching GENES in a portal. Easy enough. But I'd like to calculate how far the insert is from start. That is: 1000-500 for the first gene, and 1000-700 for the second gene. The problem is that many INSERTS can match a given GENE, but I don't know how to have the calculation pull data from the current INSERT instead of the first INSERT that matches the relationship. That is, when looking at the second INSERT (2000) which pulls up the second GENE (700/2500), it gives me the result 300 instead of 1300.

I know one solution is to create a join table, but that requires updating the join table via a script everytime one of the datasets changes. Are there any other solutions? Is there a way to determine the currently active record, so that I could have a calculated field that's equal to 1 for the current record, and 0 for all other records, which I could then include in the relationship? Thanks for any help!

-Terence

Ugo DI LUCA · November 18, 2004

You could fill a global value in the Gene's table with that current value of the current record in the Insert Table, with a script to update the Portal with a RefreshWindow (flush cache).

Barbecue · November 18, 2004

This seems like it should be possible by using recursive custom functions, though I'm no expert on them.

What's the objection to using a join table and a script? How are the datasets being updated?

TerenceM · November 19, 2004

I was hoping there was a trick to do this without having to hit a button, although I guess that's not an unbearable hardship. I could also take over full control of navigation with buttons for switching records and finds, and always update the global during those scripts.

Making and maintaining a join table would be daunting: the INSERTS table has 66000 records, the GENES table has 14000 records, and a join table would probably be ~200000 records (or over a million records with some related calculations). At least the GENES table doesn't change very often, but new records are constantly added to the INSERTS table.

Thanks for the suggestions. I'll keep an eye open on this thread if any other thoughts come up.

-Terence

Ugo DI LUCA · November 19, 2004

With such a number of records, involving a global might slow things, so if you can script the navigation, then, you may try, as this is a One To Many relationship to :

1. Set a number (increasing to each step) indexed field in the current record whenever you'd move to a record. This can be done with a 2 lines script involving a global :

SetField [number; global+1]

SetField [global; number]

or by a neat trick by Bruce Robertson (BruceR here on the Forums) that involve the LookupNext ( ) function. Search for this post if you wish.

2. Use an unstored calculation in the Gene Table that will grab this max number in Insert.

3. Base a new relationship back to the Insert Table with this max number AND the Start/End to Location relationship.

4. Now a calc in Gene can use a reference to the current record.

I'm unsure how this solution works compared to the global thing, but it may be worth a try.

TerenceM · November 19, 2004

That's a very clever way to approach transfering the information to the second table, although I suspect the Max (Number) calculation over 66000 records could be a bit slow. I'll play around with it.

The big trick for me with the size of these databases is to avoid the >= and <= relationships. In this case, I just round the insert position off to the nearest 500, construct a keylist for the genes with every 500th position between start and end (and a little beyond), and use an = relationship. This works MUCH faster, although I'm disappointed that I can't use the precise and elegant solution offered in FP7.

Thanks for the suggestions.

-Terence

Ugo DI LUCA · November 19, 2004

Hi,

Rather than Max( ), use either Bruce idea which is pretty quick and neat.

Based on my first own tests on the subject with a 450,000 records set a few weeks ago, I found that what slows things in the FM7 built in range relationship is the number of fields and calculations in the targetted table. The more I added fields in the building process, the longer the elapsed time was for the filtered relationship on a range, would they be indexabe fields or not (of course unstored calcs were amplifying the behaviour).

I'm still clueless why a Join table would lead to 200,000 records and up to 1 million as you said.

If the real join doesn't meet your needs, you may rely on a simily Join Table that holds the minimum required fields, but that can be used as an alternative path to the targetted Table.

In your case, I would think of a small Table holding a scripted Unique ID for a given Start and End values, call it RangeGrid.

Of course, each Gene records would be given its referenced RangeGridID.

If I understand your goal, I would think there would be less records in this table than in the Gene one, even lessening your rounding to a 100th instead of 500th.

Displaying the Gene records through a Relationship from Insert to RangeGridID is not an option, even if possible, while a Multiline key in Insert holding its related RangeGridIDs from a range relationship to the RangeGrid Table, could be used to target the GeneTable quicker.

A scripted update of this multiline key in Insert would be my preferred method in this instance.

transpower · November 19, 2004

Have you tried constructing a double non-equijoin relationship using the Relationships Graph? This will get the correct records into your portal. Then it's just a matter of adding a calculation field to the portal which will use values from the portal record and the hosting record.

TerenceM · November 19, 2004

Ugo -- The Join table would be large because I want a listing of all nearby genes for a given insert (commonly just two, but sometimes as many as five), hence the join table would have ~3 times as many records as the inserts table. I'd also like to be able to list information for individual transcripts of a given gene, leading to the million record estimate. The RangeGrid table you suggested might shrink things a little, but not very much given the data, but I'll experiment around to see if its any faster.

I'm perplexed by why having more (unrelated) fields in a table would slow things down, even if the related fields are indexed. But it does mean I could have a simple intermediate table with just the geneID, start, and end, and relate that to a second gene table that holds other pertinent information. Useful to know.

Transpower -- I'm not sure I understand your suggestion. I'm essentially using the relationship:

location >= start and location <= end

which works fine. But my main problem is that multiple insert locations match a single gene, making a many-to-many (Insert to Gene) relationship, but I can only do math with the first Insert for each Gene. Is there a way to do the math using data from the currently viewing Insert? The Script trick suggested by Ugo will do the job (to let the Gene db know which Insert I'm currently viewing), but I would still prefer a more elegant solution.

Is there a way to write a calculation directly on a layout? My problem would be solved if I could do math between merge fields, but it doesn't look like FP7 supports that, either.

-Terence

transpower · November 20, 2004

What do you mean by "essentially"? What does your Relationships Graph look like? What does the Connection Dialog Box show?

Add a calculation field directly to the portal row. Base this calculation on values from the two tables.

TerenceM · November 21, 2004

For the example as described, the relationship is exactly as described:

location >= start

AND location <= end

I said essentially because the full implementation of this database uses the rounding and keylist approach:

rounded_location = rounded (location / 500 ; 0 ) * 500

Start_end_List = every 500th value between start and end

relationship is Rounded_location = Start_end_list

Using this relationship, I have a calculation field in the GENES table with the calculation:

INSERTS:location - GENES:start

If I include this in the portal row, it works, but always uses the first INSERT found by the relationship, even if I'm looking at the third matching INSERT. The problem is its a many-to-many relationship. I was hoping there was a way to determine the record currently being viewed, which I could use to reduce it to a one-to-many relationship, but I guess with the ability to have multiple windows open that wouldn't really work anyway (even worse with multiple users, but that's not an issue with this database).

The full database is still in a conceptual stage, so I'm just trying to decide the best way to implement the data structure to allow me to display everything I want. I'm thinking I'll probably go with the script approach, although I haven't ruled out making the join file yet.

-Terence

transpower · November 21, 2004

OK, you're in the GENES table looking at a portal of INSERTS. Should there not be a calculation displayed for each row of the portal? That is, I don't understand why you would be seeing just one calculation (based on the first INSERT) and not all of the calculations. Each value should be different because the INSERTS::location is different.

Maybe if you attach a stripped down version of your tables, we could figure this out.

TerenceM · November 21, 2004

In this case, I'm in the Inserts table, looking at a portal of GENES. Sorry if I didn't make that clear.

Ultimately I want to be able to do this calculation from both sides (in a portal of GENES on an INSERTS layout, and in a portal of INSERTS on a GENES layout). Both calculations have the same problem, since its a many-to-many relationship (one Insert can be near several Genes, and One Gene can have lots of Inserts).

For the first version (Inserts table, looking at a portal of GENES): There should be a calculation displayed for each row of the portal, and there is (INSERTS:location - GENES:start). But the calculation will always use the first INSERT that matches the relationship, even if its not the one currently being viewed. I understand why this happens, I'm just hoping there's some clever way to modify the relationship so that its one-to-many based on the record currently being viewed, without having to run a script.

Does this make things a little clearer? I have a small test file at work, which I can post tomorrow if it'll help.

-Terence

transpower · November 22, 2004

OK. In your Relationship Graph, add two more Table Occurrences (TO's). This time, make it a double non-equijoin the other way. So you'll have four tables displayed: two will have a double non-equijoin one way, and two the other way.

Ugo DI LUCA · November 22, 2004

Hmmm....

An Insert may have 5 genes, but these 5 genes may be affiliated to 12 other Inserts Transpower.

You may duplicate as many tables as you wish, each record from Children Table, in each portal row, necessarily point to its first related Parent, unless you specify using a global or any temporary field/method which ParentID it should get the math from, in another Table Occurrence.

In this case, the ID should be the current browsed ParentID.

At this stage, there's no other way than a script, but there might be better ways of getting the count straight. I was thinking of LookupNext( ) just a few minutes ago, and it could work here too.

Sign In

Math between related records

Recommended Posts

TerenceM

Ugo DI LUCA

Barbecue

TerenceM

Ugo DI LUCA

TerenceM

Ugo DI LUCA

transpower

TerenceM

transpower

TerenceM

transpower

TerenceM

transpower

Ugo DI LUCA

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information