mis-matching relationships

pedrofp · March 23, 2001

G'day folks

I'm building a database of little chunks of gene sequences [alphabetic only text strings uup to around 150 characters long]. Each sequence must be uniquely numbered with later occurences of the same sequence inheriting the same number as the first. To do this I've created a lookup database with the first occurence each string and the ID number assigned to it. The files are related through the sequence fields and the ID number in the main database is defined to autofill by lookup to the lookup database. New data is imported as sequences from spreadsheets and if a sequence already exists it inherits the original's ID number and those that remain are then given new numbers and copied to the lookup database.

It mostly works well except that sometime a sequence will match with a very similar sequence and incorrectly inherit an ID number. Examples are ...

CTPYGGHCGYHNDCCSHQCNINRT KCE matches with the new sequence ...

CTPYGGHCGYHNDCCSHQCNINRN KCE

CATYGKPCGIQNDCCNTCDPAR KTCT matches with the new sequence ...

CATYGKPCGIQNDCCNTCDPAG KTCT

CGKPGDTCGKLYLKCCSGRCSGK CLP matcheswith the new sequence ...

CGKPGDTCGKLYLKCCSGRCSGS CSGKCLP

CKSOGSSCSOTSYNCCRSCNOYTKRC matches with the new sequence ...

CKSOGSSCSOTSYNCCRSCNOYTKRC YG

CKQSGEMCNLLDQNCCDGYCIVF VCT matches with the new sequence ...

CKQSGEMCNLLDQNCCDGYCIVL VCT

TTCCGYDPGTMCPPCRCTNSC matches with the new sequence ...

TTCCGYDPGTMCPPCRCTNSC PTKPKKPGRRND

In all of these Iv've inserted a space after the first diference. I've also noted that the earliest diference is the 21st character.

I'm at a loss to eliminate these mis matches, any advice would be appreciated very much.

[ March 23, 2001: Message edited by: pedrofp ]

signal · March 23, 2001

Filemaker only indexes a certain number of characters per word. You must be exceeding that limit. Can you insert a space somewhere in the string? That should fix the problem.

LiveOak · March 23, 2001

FM only indexes the first 20 (or is it 22) characters of a "word". You'll need to insert spaces to break the 150 character sequences into several "words". -bd

Kurt Knippel · March 23, 2001

quote:

Originally posted by LiveOak:

FM only indexes the first 20 (or is it 22) characters of a "word". You'll need to insert spaces to break the 150 character sequences into several "words". -bd

FMP only indexes 20 character of each word and only 60 characters of each line! More importantly it will STOP indexing after you exceed either of these limits (or maybe it is just the line limit, but I do not think so).

What this means is that if you first line is 61 characters long, FMP will stop indexing the field right there. Even if every other line is 10 characters!

Be cautious with long indexes!

BobWeaver · March 24, 2001

Even if you put in a space after every 20 characters, FM will only use a maximum of 60 characters for indexing.

pedrofp · March 24, 2001

Well Folks, my problem isn't solved but now that I've had teh cause well and truly explained I can go off & try to figure a fix.

Thanx to you all. smile.gif" border="0

signal · March 25, 2001

I would think it would be easy. Just create a text calculation that inserts a space 15 characters into your text and index on that calculation instead.

pedrofp · March 25, 2001

I've considered an approach like that but if, as others say, FileMaker on;y indexes 60 characters then even this approach won't work for the long sequences.

Sign In

mis-matching relationships

Recommended Posts

pedrofp

signal

LiveOak

Kurt Knippel

BobWeaver

pedrofp

signal

pedrofp

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information