Jump to content

mis-matching relationships


pedrofp

This topic is 8455 days old. Please don't post here. Open a new topic instead.

Recommended Posts

G'day folks

I'm building a database of little chunks of gene sequences [alphabetic only text strings uup to around 150 characters long]. Each sequence must be uniquely numbered with later occurences of the same sequence inheriting the same number as the first. To do this I've created a lookup database with the first occurence each string and the ID number assigned to it. The files are related through the sequence fields and the ID number in the main database is defined to autofill by lookup to the lookup database. New data is imported as sequences from spreadsheets and if a sequence already exists it inherits the original's ID number and those that remain are then given new numbers and copied to the lookup database.

It mostly works well except that sometime a sequence will match with a very similar sequence and incorrectly inherit an ID number. Examples are ...

CTPYGGHCGYHNDCCSHQCNINRT KCE matches with the new sequence ...

CTPYGGHCGYHNDCCSHQCNINRN KCE

CATYGKPCGIQNDCCNTCDPAR KTCT matches with the new sequence ...

CATYGKPCGIQNDCCNTCDPAG KTCT

CGKPGDTCGKLYLKCCSGRCSGK CLP matcheswith the new sequence ...

CGKPGDTCGKLYLKCCSGRCSGS CSGKCLP

CKSOGSSCSOTSYNCCRSCNOYTKRC matches with the new sequence ...

CKSOGSSCSOTSYNCCRSCNOYTKRC YG

CKQSGEMCNLLDQNCCDGYCIVF VCT matches with the new sequence ...

CKQSGEMCNLLDQNCCDGYCIVL VCT

TTCCGYDPGTMCPPCRCTNSC matches with the new sequence ...

TTCCGYDPGTMCPPCRCTNSC PTKPKKPGRRND

In all of these Iv've inserted a space after the first diference. I've also noted that the earliest diference is the 21st character.

I'm at a loss to eliminate these mis matches, any advice would be appreciated very much.

[ March 23, 2001: Message edited by: pedrofp ]

Link to comment
Share on other sites

quote:

Originally posted by LiveOak:

FM only indexes the first 20 (or is it 22) characters of a "word". You'll need to insert spaces to break the 150 character sequences into several "words". -bd

FMP only indexes 20 character of each word and only 60 characters of each line! More importantly it will STOP indexing after you exceed either of these limits (or maybe it is just the line limit, but I do not think so).

What this means is that if you first line is 61 characters long, FMP will stop indexing the field right there. Even if every other line is 10 characters!

Be cautious with long indexes!

Link to comment
Share on other sites

This topic is 8455 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.