BobWeaver Posted July 12, 2002 Posted July 12, 2002 Sample file showing a method for finding occurences of words within a specified distance of each other. Refer to the script documentation to see how it works. This is in reference to the following thread: http://www.fmforums.com/threads/showflat.php?Cat=&Board=UBB6&Number=40809&page=0&view=collapsed&sb=5&o=31&fpart=1
BobWeaver Posted July 12, 2002 Author Posted July 12, 2002 Hmm, I don't know why my URL didn't turn into a link automatically. Oh well.
Keith M. Davie Posted August 11, 2002 Posted August 11, 2002 Bob, a "proximity" find, an interesting concept. So I looked at the first record and saw that big came 16 words after dog. (...dog [1]jumped [2]over [3]the [4]fence, [5]and [6]before [7]Henry [8]was [9]able [10]to [11]stop [12]him, [13]he [14]chased [15]the [16]big...) So I set "Find Word 1: dog", "Find Word 2: big", and "Proximity: 16". When I clicked the find button, all eleven records were found. (Record 7 = "...dog [1]jumped [2]over [3]the [4]fence, [5]and [6]before [7]Henry [8]was [9]able [10]to [11]stop [12]him, [13]he [14]chased [15]the [16]little... " I have not studied the calculation or the script, but something is not right with this large a proximity. Is there a limit to the proximity which is checkable?
Keith M. Davie Posted August 11, 2002 Posted August 11, 2002 Bob, I had a good night's sleep and awakened thinking that I had not tested the solution quite enough. So I entered the following "Find Word 1: big", "Find Word 2: little" and "Proximity: 18". Interestingly, this returned 4 records. Now in two of those records "little" comes first in the sentence and preceeds "big", e.g., "...little [1]white [2]dog [3]jumped [4]over [5]the[6]fence, [7]and [8]before [9]Henry[10]was [11]able [12]to [13]stop [14]him, [15]he [16]chased [17]the [18]big ..." Then on further examination of those records I recognized their similarities and I altered one (the last of the found set) by deleting the word "him". Showed all records and, leaving the search criteria the same, performed the find. Once again four records were found, including: "...big [1]white [2]dog [3]jumped [4]over [5]the[6]fence, [7]and [8]before [9]Henry[10]was [11]able [12]to [13]stop, [14]he [15]chased [16]the [17]big ..." I'm sorry to say that there seems to be a problem. I hope you find this useful.
BobWeaver Posted August 11, 2002 Author Posted August 11, 2002 There quite possibly are problems with the method. I threw it together rather quickly in response to the question in the original thread. Although I tried testing it for a quite a few different situations, I'm sure I left out a few. I guess I posted this more as a concept than a finished solution (okay I'm copping out here). Anyway, I'll have a look at it again when I have a few minutes, and try to figure out the problem.
BobWeaver Posted August 11, 2002 Author Posted August 11, 2002 Oh yes, I should have pointed out that the order of the search words is insignificant. So, in the first test that you pointed out, if the word 'big' occurs either 16 words before or after the word 'dog' then it is a match. Since many of the records contained the phrase 'big white dog', they would meet the criteria, and as a result, all 11 records in my sample file correctly match the criteria. I didn't check out the situation mentioned in your second post. Could it be related to this same word order thing? It's actually easier to make the method order dependent, but I didn't think it would be quite as useful. For example if you wanted to find all records that have references to Thomas Edison, you could enter Thomas as word 1 and Edison as word 2 and a distance of 2. This would then match all of the following: Edison, Thomas Alva Thomas Alva Edison Thomas Edison which is probably what the user wants.
Keith M. Davie Posted August 12, 2002 Posted August 12, 2002 "...which is probably what the user wants." Yes, you are right there. And in that regard (e.g. Edison) it works well. I just saw these patterns and was testing what were no doubt extremes. But the explanation and caveats which you provide certainly are reasonable, esp. if the limits or usage is clearly defined for the client.
BobWeaver Posted August 12, 2002 Author Posted August 12, 2002 Anyway thanx for the input. It's alway good to have someone else doublecheck things. No matter how much debugging I do, I find it so easy to overlook things when I check my own work.
Recommended Posts