pedrofp Posted March 23, 2001 Posted March 23, 2001 G'day folks I'm building a database of little chunks of gene sequences [alphabetic only text strings uup to around 150 characters long]. Each sequence must be uniquely numbered with later occurences of the same sequence inheriting the same number as the first. To do this I've created a lookup database with the first occurence each string and the ID number assigned to it. The files are related through the sequence fields and the ID number in the main database is defined to autofill by lookup to the lookup database. New data is imported as sequences from spreadsheets and if a sequence already exists it inherits the original's ID number and those that remain are then given new numbers and copied to the lookup database. It mostly works well except that sometime a sequence will match with a very similar sequence and incorrectly inherit an ID number. Examples are ... CTPYGGHCGYHNDCCSHQCNINRT KCE matches with the new sequence ... CTPYGGHCGYHNDCCSHQCNINRN KCE CATYGKPCGIQNDCCNTCDPAR KTCT matches with the new sequence ... CATYGKPCGIQNDCCNTCDPAG KTCT CGKPGDTCGKLYLKCCSGRCSGK CLP matcheswith the new sequence ... CGKPGDTCGKLYLKCCSGRCSGS CSGKCLP CKSOGSSCSOTSYNCCRSCNOYTKRC matches with the new sequence ... CKSOGSSCSOTSYNCCRSCNOYTKRC YG CKQSGEMCNLLDQNCCDGYCIVF VCT matches with the new sequence ... CKQSGEMCNLLDQNCCDGYCIVL VCT TTCCGYDPGTMCPPCRCTNSC matches with the new sequence ... TTCCGYDPGTMCPPCRCTNSC PTKPKKPGRRND In all of these Iv've inserted a space after the first diference. I've also noted that the earliest diference is the 21st character. I'm at a loss to eliminate these mis matches, any advice would be appreciated very much. [ March 23, 2001: Message edited by: pedrofp ]
signal Posted March 23, 2001 Posted March 23, 2001 Filemaker only indexes a certain number of characters per word. You must be exceeding that limit. Can you insert a space somewhere in the string? That should fix the problem.
LiveOak Posted March 23, 2001 Posted March 23, 2001 FM only indexes the first 20 (or is it 22) characters of a "word". You'll need to insert spaces to break the 150 character sequences into several "words". -bd
Kurt Knippel Posted March 23, 2001 Posted March 23, 2001 quote: Originally posted by LiveOak: FM only indexes the first 20 (or is it 22) characters of a "word". You'll need to insert spaces to break the 150 character sequences into several "words". -bd FMP only indexes 20 character of each word and only 60 characters of each line! More importantly it will STOP indexing after you exceed either of these limits (or maybe it is just the line limit, but I do not think so). What this means is that if you first line is 61 characters long, FMP will stop indexing the field right there. Even if every other line is 10 characters! Be cautious with long indexes!
BobWeaver Posted March 24, 2001 Posted March 24, 2001 Even if you put in a space after every 20 characters, FM will only use a maximum of 60 characters for indexing.
pedrofp Posted March 24, 2001 Author Posted March 24, 2001 Well Folks, my problem isn't solved but now that I've had teh cause well and truly explained I can go off & try to figure a fix. Thanx to you all.
signal Posted March 25, 2001 Posted March 25, 2001 I would think it would be easy. Just create a text calculation that inserts a space 15 characters into your text and index on that calculation instead.
pedrofp Posted March 25, 2001 Author Posted March 25, 2001 I've considered an approach like that but if, as others say, FileMaker on;y indexes 60 characters then even this approach won't work for the long sequences.
Recommended Posts
This topic is 8712 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now