Similarity searches ?

Followers

August 13, 200124 yr

Hi all together,

I'm working in the field of biology/bioinformatics. I have generated a database of proteins. The most important field holds the sequence of the proteins which consist basically of streches of characters such as GGLKKLGKKLEGVGKRVFKASEKALPVAVGIKALG. Those sequences can be as long as 2000 characters. The task is to search in this field with another protein sequence for similarities and sort the result from highly similar (=identical) down to less similar.

Up to now I have done this by incorparating the dos/mac versions of the blast program by the NCBI (http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html) which are called by send message commands etc. For a number of reasons I would like to do everything in filemaker.

Has anyone tried to do that already and if so could he/she give me some help on the concept of programing this? The explanation of the NCBI of this algorithm is somewhat to mathematical for me (http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html).

Thank you in advance

Daniel

August 14, 200124 yr

Problem: FileMaker Pro indexes and only searches on the first 20 characters of a word -- where a word is a string of characters without a space -- and (IIRC) 60 words in each field. So to do what you want you might have to insert a space after every 20th character in your protein string.

As for doing it in FMP: you'll first have to develop the algorithm, then rite a script that does the same job. How would you perform the task yourself? Write down the steps. Is it iterative in places? How many iterations are needed?

I have a gut feeling that any solution created in FMP will be orders of magnitude slower than a purpose-written program.