Comparing text in two fields

Genx · February 18, 2006

.. heres the story... i need a calculation that will evauluate quickly... the calculation only has to evaluate to 1 or 0...

i have two global fields... we will call them field A and field B both in the same table... now... if all the values in field A are in field B (where field b can have the values in field A plus others) then it evaluates to 1.. otherwise it evaluates to 0... ive tried a custom function referred to me by soren called zapvalues using variables that are set equal to fields a and b:

If (

IsEmpty ( ListA ); "";

LeftValues(ListA; IsEmpty(FilterValues(LeftValues(ListA; 1); ListB))) &

ZapValues(RightValues(ListA; ValueCount(ListA) - 1); ListB)

)

but its to slow... (field a and field b can have upto 10000 values at one time) currently this takes 30 seconds.. i need something quicker that will take at most 5 seconds... is there anyway i can use some sort of self join or something that would be quicker or a combination of filter / exact value functions?

cheers

~genx

CyborgSam · February 19, 2006

It may speed things up radically if you perform a sort on both fields' values, then you can simply compare the two fields. A custom function could take the field, copy it to a $ variable, sort its values, and then return the sorted result. My guess is that by doing the sort in a custom function and using variables FileMaker will be faster than doing it in scripts using variables.

The best sort for most data is the Quick Sort. A Bubble Sort on data sets this large will take way too long. At the end of this drivel is my favorite Quick sort algorithm.

Since the data is treated as an array for sorting, there will be a LOT of moving the same data as the field is sorted [lots of Leftvalues (...) & RightValues (...) steps]. It may be quickest to do the sort in three steps:

1) Copy each value into a repeating global variable (e.g. the 793rd value foes into $$QuickSortArray[793] ).

2) Quick Sort the array (repeating global variable)

3) Reassemble a single field from the repeating global variable

I started working on this as a brain exercise and decided I need to go watch a movie and quit obsessing...

Below is my favorite Quick Sort algorithm in Pascal. IIRC from 22 years ago, it is a modification by Knuth of the "standard" Quick Sort, it is a touch faster than because it handles endpoints more efficiently (it doesn't pass the endpoint's index to the recursion, thus involving less if tests)

procedure QuickSort (lowIndex, highIndex: integer);

var

left integer ;

right : integer ;

temp : integer ;

leftFixed : boolean ;

begin

if ( lowIndex < highIndex ) then

begin

left := lowIndex ;

right := highIndex ;

leftFixed := true ;

while ( left <> right ) do

begin

if ( theData

> theData

) then

begin

temp := theData

;

theData

:= theData

;

theData

:= temp ;

leftFixed := not leftFixed ;

end;

if leftFixed then

right := right - 1

else

left := left + 1 ;

end;

if ( left > lowIndex ) then

QuickSort (lowIndex, left - 1) ;

if ( right < highIndex ) then

QuickSort (right + 1, highIndex) ;

end;

Genx · February 19, 2006

Assuming this is the custom function:

procedure QuickSort (lowIndex, highIndex: integer);

var

left integer ;

right : integer ;

temp : integer ;

leftFixed : boolean ;

begin

if ( lowIndex < highIndex ) then

begin

left := lowIndex ;

right := highIndex ;

leftFixed := true ;

while ( left <> right ) do

begin

if ( theData

> theData

) then

begin

temp := theData

;

theData

:= theData

;

theData

:= temp ;

leftFixed := not leftFixed ;

end;

if leftFixed then

right := right - 1

else

left := left + 1 ;

end;

if ( left > lowIndex ) then

QuickSort (lowIndex, left - 1) ;

if ( right < highIndex ) then

QuickSort (right + 1, highIndex) ;

end;

end;

what do i plug into lowindex, highindex, integer?... sorry im a bit new to custom functions in general so...

secondly, you say use a repeating global variable right... which is fair enough for field A... field B however is likely to have previous values within it already...

i know you've taken a lot of your time to answer this question, but could you possibly create a small sample script to show me how i would use all this?

if not, thanks for this help anyway,

cheers

~genx

LaRetta · February 19, 2006

Maybe I'm not understanding what you need. I'm unsure a custom function is needed here? Try this one:

Let (

sort = FilterValues ( FieldA ; FieldB ) ;

PatternCount ( sort ; FieldA )

)

I assume this is just for Developer work since they are globals. Otherwise I would question a structure which required 'up to 10000 values' in one field. In fact, I am STILL shaking my head (ever so slightly). :wink2:

LaRetta

LaRetta · February 19, 2006

Oh, sorry genx, Position() would be quicker still:

Let (

sort = FilterValues ( FieldA ; FieldB ) ;

Position ( Sort ; FieldA ; 1 ; 1 )

)

In this way, it won't have to look for the pattern throughout the list. It will short circuit and stop evaluating on the first true. And are these lists pure or might they contain spaces, extra carriage returns, or other unwanted characaters? You may wish to add an additional garbage-stripper if so (one leading space in FieldA would break it). I have no idea how the data gets planted. :crazy2:

LaRetta

Genx · February 19, 2006

data gets planted through a looping script that cycles through foundset of records and stocks id + CRLF... this then compares to global or variable i dont really care, which contains a large number of id's that have already been "flagged" by also stocking their relevant id's + CRLF...

hope that clears it up, this problem is really agrivating me, maybe someone can suggest a better way of it... the only reason that field would stock 10000 values is because the particular layout would pretty much always have 10,000 in the original foundset... all the help is really appreciated

~genx

Genx · February 19, 2006

sorry.. that last post wasnt to clear...

i have two fields

1) Stores all id's in foundset

2) Stores id's that have been added to it in the current session... in random order... i.e.

field 1

id1

id45

id1110

field 2

id45

id2

id3

id1

id1110

if all values in field 1 are in field 2... i need it to evaluate to true...

the above example would evaluate to true...

~genx

LaRetta · February 19, 2006

Have you tried the calculation I provided? Because it does exactly as you've requested in your last post, does it not? (attached). Change the IDs all you want - in either field. The evaluation will hold and would work as a calculation (global). If you receive a '1', your script can proceed under the premise that ALL of the IDs in Field A appear in Field B.

The cResortedList is only to display the FilterValues() at work in the demo and won't be needed in the solution itself.

Pattern.zip

Søren Dyhr · February 19, 2006

if all values in field 1 are in field 2... i need it to evaluate to true...

I would say a relation, from the gathered newly created ID to the ID's in would establish any inbalancies much quicker, than any looping or CF'ing would do at all.

Count(Relation::RecordID) = ValueCount(Field 1)

...provided the RecordID is unique - obviously.

Hint ...other relationtypes than equaljoin exists, and could be exploited to a very fast strainerlist, say re-evaluate ValuelistItems( over a unequal relationship will give a list of unnasigned ID's, if the plucked ID's exists in a global field which act as primary key for the relation that builds the valuelist.

What I here say is that you might be burdened by parts of the solution solved by idiosyncracies ...such as I must have access to event triggers, looping is manditory and solving issues with real data in repeating fields are unavoidable.

Get a dog, give the keyboard a rest - exercise the dog by tossing sticks an such ...and don't come back until the dog is dead exhausted or all sticks have been atomized. Then reverse the angle you usually solve matters ...so objects comes before scripts - an awfull lot can be solved with relational structures and proper field definitions! !!! !

Why bother sorting into an index when filemaker already does it behind the screens.

--sd

Søren Dyhr · February 19, 2006

I tried rereading your postings:

1) Stores all id's in foundset
2) Stores id's that have been added to it in the current session... in random order... i.e.

Is the found set a set of availiable items in a certain span of time?? If so could you make a multi-criteria relationship that gives you the found set as ID's avaliable via ValueListItems( No need to put into a global field at all.

Could be part of the validation of the other global field ...LaRettas idea with FilterValues is actually handy here, because the validation will fail if the numbers of values not matches the numbers of values being filtered by the ValueListItems( as criteria) ...and you're not allowed to make new records by splitting the global into records.

But again are we kicking tumble weed, with methods perhaps not even needed, but produced by a stubborn mind that insists on scripting as much as posible???

--sd

Genx · February 19, 2006

...sigh, thanks for all your help guys but i give up... your calc works great for up to about 5 or 6 hundred records laretta but anything above that... its to slow.. like a minute.. anyways.. im scrapping my current flagging system...

my original flagging plan was to flag all records in the found set and unflag them all using the same button... basically enacting one of two scripts when pushing that button... hence the whole point of using that evaluation.... but like i said ill have to scrap that and just swap it for two different buttons... one flag all found, two remove all flagged... my end users will have to live, as for the dog idea soren... soon... soon.. just have to finish this, a week left and im hopefully free... until the next 2 projects ive got lined up, sigh...

anyway, thanks for all your time everyone..

~genx

CyborgSam · February 20, 2006

One way to prevent having to unflag all the records is to use a timestamp as the flag field and store that value elsewhere for use in relationships and/or searching.

1) Set a variable with time stamp

2) Replace Field Contents with the variable

3) Store the variable in some field

Søren Dyhr · February 20, 2006

But you miss my point, it's probably not a scripted matter at all - flagging as such is pointing in direction of inadequate relational design ...that's what you were supposed to be realizing when tossing sticks away...

Read the first 4 line of this: http://en.wikipedia.org/wiki/Object-oriented_programming

You seems to be cought in that notion! Play instead with multicriteria relations, since the flagging already goes an underneath the hood - why instate redundance??

--sd

LaRetta · February 20, 2006

Hi genx,

I'm curious on how you performed your test on the calculation I gave you. Because no matter what I do, when comparing 1,000 IDs to 10,000 IDs (10-character), the results are instantaneous (ie, won't even register a second on my FM counter). :wink2:

And yes I'm in agreement that a relationship should be used. But this is still a good technique for many situations when comparing two text strings. I THOUGHT it was quick and my tests substantiate it. I have NOT tested in networked environment yet; however, because they all are globals they should run on the client side (?) so it shouldn't slow the calc down.

Remember that, since you are using globals to hold the data, you don't need an unstored calculation which must run on every record. You can use a global calculation which will produce instant results. I have the test file and I'd be happy to script it up for a test or two!

LaRetta · February 20, 2006

I realize I had 'loaded' the results because my sample data (FieldA) didn't have duplicates but my full data FieldB (10,000 records) did. So I added duplicates in the sample data (FieldA).

The sort, comparison and resutant display takes 5 seconds from start to finish. I can't see how it could possibly take one minute for 500 records (see attached). When dealing with a found set, a value list won't be of help. I used it for this demo just to quickly populate the fields instead of loop script. But in a real situation, a VL would be worthless with a found set (I believe).

Pattern.zip

Genx · February 20, 2006

It might not even be your calculation thats causing the lag LaRetta... ive been trying a fair few things to get this whole process to work, and i havent been cleaning up the mess ive made...

... as for my relational design if someone can suggest a better way of doing this im all ears... heres the scenario again... i mean i personally thing my data is split up pretty well, i have my contact details in one table, my names in another, etc. etc. i can manipulate things freely, but if theres some relational idea you can suggest that could improve the flagging...

1) I have a table that has approx 10,000 records in it, it has a unique serial id "idxxxxx" and a few more fields.

2) my current flagging system works by adding the id to a global field (seeing as this will be run in a network environment with different users flagging their own records)

3) i need the ability to flag either current or foundset.. which i can do at the moment... but would love the process to be quicker in terms of checking if a foundset has been flagged...

4) when all flagging is complete, i just go through a self-join relationship between the global and the serialid to find all flagged records...

if anyone has a better way of doing this really.. ill love you for it...

cheers,

~genx

Genx · February 20, 2006

... oh right, and i tested it by shoving it head first into my own database ... in the mean time im gonna clean it up a bit, and wait and hope for someone to suggest a better structure for me... and try implement your calcs again....

~genx

Søren Dyhr · February 21, 2006

i just go through a self-join relationship between the global and the serialid to find all flagged records...

Whats the purpose with the flagging, whats the criteria for the flagging ...I seems to recall something about if newly arrived things were not included, should they be included as well???

What I can't see is if you, use the flagging to store found sets that might have a best-by-date. If it is, should you perhaps consider another approach to save....

http://www.filemakermagazine.com/modules.php?op=modload&name=News&file=article&sid=544

--sd

Genx · February 21, 2006

the purpose for the flagging is a mail merge, the criteria is only that the user selects to flag the current record or the current found set... it is very temporary... but i like technique files so ill have a look at this one anyway..

~genx

Søren Dyhr · February 21, 2006

It's probably a question of lingo that prevents me reaching you, but without knowing what "query criteria" the found set is established on is it next to imposible to pinpoint where your relational structure is inadequate ...only that your wish for tagging might be wrong since it takes time to establish and not is regarded as an object.

A found set is eastblished on records WHOSE value in some field matches ...is it clients in the database who have been neglected for a certain timespan, or those who hardly lays a penny in the till???

On the other hand could it be arbitrary pluckings in the base, here comes another approach in handy Omit Multible which can be controlled this way - but again is there penalty to it, unfortuantely does it take some time to perform on say 17000 records:

Loop 

     Go to Record/Request/Page [ Next; Exit after last ] 

     Omit Multiple Records [ Truncate ( Random ; 1 ) * 10 ] [ No dialog ] 

End Loop





...even though nothing is written in fields to invoke validations etc. The yield is some 3000 records... however if you try to make a found set based on native queries, like the movie I guided you to shows how to handle ...is it much faster.



Say you wish a pseudo abitrary query to produce almost the same yield could it be done this way:

Show All Records Enter Find Mode [  ] 

Set Field [ Untitled::pling; "*4*" ] 

Perform Find [  ] 

Enter Find Mode [  ] 

Set Field [ Untitled::pling; "*1*" ] 

Constrain Found Set [  ]

The field I here makes the query in, is the recordID, other integer in the two parts combinations yields other sizes of found sets. But the speed is much faster than any looping or CF can produce.

--sd

Vaughan · February 21, 2006

If you are using FMP 7/8 there are ways of working with a found set that doesn't destroy it... the most elegant is to open a new window (off screen if necessasy) via a self-join that has the same found set, do the work in this window, then close it when done and return to the original window.

Genx · February 21, 2006

ok, soren, your just confusing me... once again, the point of flagging is a mail merge... the user manually selects records they want to include in the mail merge because they can.. not because they havent contacted the client in a while or anything like that, just because they can... in stead of doing a mail merge one at a time, flagging allows for records to be marked prior to a mail-merge thereby saving time, once again the point of this whole post was an extra flagging feature which was one button that could both flag and unflag the foundset depending on whether all records in the foundset were within a flagged global... like i said, ive given up on this because its far to much effort for so small a problem...

anyway, vaughan, i'll take your information to heart because i didnt know you could preserve a foundset this way so thanks,

anyway, thanks for everyones help and time, know that i appreciate all of you

~genx

Søren Dyhr · February 22, 2006

Alright I finally think I got the picture, it's definatly something that requires processing at full throttle, but again the smaller set of data the user handles the less the impact. I've made a template that more or less does what you're after (i think) ...play with it nad see if you can get the finer points i it.

It uses to tecniques Copy All Records which is a little iffy to use, because it tampers with the clipboard, so the script should take care of the restoration process as well.

The other technique is one that Jeff Almquist gave some years ago in Advisor, and it utilizes a calc'field in each line - instead of setting the records field by script ...why is it better??

Only shown fields are calculated/evaluated, and seriously can't many non auspergers handle more than 20-30 lines of records in one view. So what it does is that it spreads the processing into chunks to process when say a scrollbar is handled - instead of doing it all in one blow into stored fields.

--sd

multitagging.zip

Søren Dyhr · February 22, 2006

the most elegant is to open a new window (off screen if necessasy) via a self-join that has the same found set, do the work in this window, then close it when done and return to the original window.

Well I feel this is even better:

http://www.sumware.net/robfm/savingfoundsets.php

--sd

Genx · February 24, 2006

I really, really like the preserving foundset technique, however the sample flagging file i dont because it uses the copy and paste functions cluttering the user's clipboard in windows...

Cheers lots for the heads up on the foundset thing again though, its a lifesaver.

~Genx

LaRetta · February 24, 2006

Not only does storing large numbers of IDs on clipboard clutter it but it can also crash your windows system which can be irritating ... I'm not responding to the foundset technique but rather the original suggestion of storing the whole load (Copy All Records of found set) onto clipboard ...

Søren Dyhr · February 24, 2006

I said it was iffy, and if the number of ID's was a little less, would I definatly use Rays CF GetValueSet(

GetNthRecord ( Field; StartValue ) & 

If(IsValid(GetNthRecord ( Field; StartValue + 1));

¶ & GetValueSet(Field; StartValue + 1))

From http://www.nightwing.com.au/FileMaker/demos8/demo804.html

Although it slows things down a little, btw is the upper limit for CF's recursions 30000 or...?? "Copy All Records" is speed king.

But throw Rays CF in and see if you can live with the speed?

--sd

Sign In

Comparing text in two fields

Recommended Posts

Genx

CyborgSam

Genx

LaRetta

LaRetta

Genx

Genx

LaRetta

Søren Dyhr

Søren Dyhr

Genx

CyborgSam

Søren Dyhr

LaRetta

LaRetta

Genx

Genx

Søren Dyhr

Genx

Søren Dyhr

Vaughan

Genx

Søren Dyhr

Søren Dyhr

Genx

LaRetta

Søren Dyhr

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information