Newbies Rhys Posted May 3, 2008 Newbies Posted May 3, 2008 Hi there, I am wondering if anyone has a solution that would enable me to still collect addresses from found set but if the same email address appears twice it will ignore the second one? Any ideas would be appreciated! Rhys
Søren Dyhr Posted May 3, 2008 Posted May 3, 2008 To avoid duplication in databases if FAQ, but a method is this: http://sixfriedrice.com/wp/deleting-duplicate-records-in-filemaker/ Where you change their script slightly to deal with your found set, into: Go To Layout[ `record_list` ] Go to Record/Request/Page [ First ] Loop If[ Not IsEmpty( duplicate_records::ID ) ] Omit Record [ No Dialog ] Else Go to Record/Request/Page [ Next, Exit After Last] End If End Loop Your key values for the selfjoin will in this case be e-mail equal e-mail and RecordID unequal RecordID. --sd
comment Posted May 3, 2008 Posted May 3, 2008 You cannot test for duplicates in a found set by using a relationship.
Søren Dyhr Posted May 3, 2008 Posted May 3, 2008 Yes thats right thats a faulty approach, but if we then use Ugo's method could relation indeed still be used... with GTRR(FS) --sd
comment Posted May 3, 2008 Posted May 3, 2008 Yes, they could - but I thought the purpose was to omit duplicates, not use relationships... I believe a fast summary method, using Omit Multiple Records [ Getsummary (...) - 1 ], would be the most efficient here. Alternatively, if the found set is not too big, you could loop through it, using variables to compare a record to the previous one.
Søren Dyhr Posted May 3, 2008 Posted May 3, 2008 I read the purpose as a way to prevent sending e-mail to the same address twice, within the found set... --sd
Newbies Rhys Posted May 4, 2008 Author Newbies Posted May 4, 2008 (edited) Thanks for everyones efforts so far. I dont think I made myself clear enough, which is entirely my fault. I am trying to to use the script step "Send Mail" and one of the options is "collect addresses from across found set" you can specify a calculation or a field. Any ideas Edited May 4, 2008 by Guest
Newbies Rhys Posted May 4, 2008 Author Newbies Posted May 4, 2008 (edited) Hi there, There is probably a simple solution to this however I cannot figure it out! Basically I have a script which loops going to the next record of a found set and once it gets to the last record is exits and the loop and does a whole lot else ie Save as PDF etc. Within the loop it sets a few fields and one of the steps in the loop is to set a variable. This variable, $sendmail, collects email addresses each time it goes around. I am wondering if anyone knows or can think of a calculation that will collect the email addresses and then if it is already present in $sendmail it won't add it so when later down in the script it sends an email using the "Send Mail" script step, it won't duplicate email addresses and send the same email to the same person two or three times! Any help would be appreciated! Rhys Edited May 4, 2008 by Guest Topics similar so merged.
Søren Dyhr Posted May 4, 2008 Posted May 4, 2008 I stand with my second suggestion using Ugo's method, you have to strain the found set for duplicates, one way or the other before the found set is used for the send mail. The export grouped by something as Comment suggest, such as the e-mail addresses in this case, and then reimport it will similar produce a found set with one of each, but the reimport will produce even more duplicates, and have as found set, to get deleted when the mail is transmitted. Comment's method is utterly fast and doesn't require any change to the relational structure, it suits a structure as flat as a pancake. Set Variable [ $where; Value:Get ( TemporaryPath ) & "tempfile.txt" ] Export Records [ File Name: “$where”; Character Set: “Macintosh”; Field Order: Untitled::theEmailAddress ] [ No dialog ] Import Records [ Source: “$where” ] [ No dialog ] Send Mail [ Multiple Messages; To: Untitled::theEmailAddress; Message: "Blah, blahbla!" ] [ No dialog ] Delete All Records [ No dialog ] The good question is then if my single line of script is any slower, in networked solutions hosted several times zones away do I honestly not think that any of the two can avoid the messagebox telling you a request is running? But I'm open for good explanations here, there could perhaps be a point in storing locally might be a tad faster ... only I'm not sure??? --sd
comment Posted May 4, 2008 Posted May 4, 2008 I dont think I made myself clear enough I thought you did, but now I am not sure. The Send Mail[] step can send either to the current record or to the entire found set. There is no option to send mail only to unique members of the found set. You need to work on your found set and eliminate duplicates, before calling the Send Mail[] step. As I mentioned before, I believe the easy (and fast) way to do this is to modify a technique called Fast Summaries by Mikhail Edoshin: http://www.onegasoft.com/tools/fastsummaries/index.shtml However, if you are looping through records and collecting the e-mail addresses into a variable, you can eliminite duplicates quite easilly by checking if: IsEmpty ( FilterValues ( EmailField ; $emails ) ) and add the address to the variable only if the above returns true (this is assuming the variable collects the addresses into a return-separated list). The export grouped by something as Comment suggest [color:red]This is NOT AT ALL what I suggested. I stand with my second suggestion using Ugo's method How many fields, TO's and relationships does that require? And how fast will it be? My suggestion requires one summary field and one script, and is about 9 times faster than plain looping.
Søren Dyhr Posted May 4, 2008 Posted May 4, 2008 Where's the looping in mine, except behind the screens and can you ensure me that neither have yours loopings behind the screens. You have track of Bob Weavers test template! It kind of ruled out Fastsummaries back then when I suggested it, my solution does indeed bloat the solution with two extra TO's and a calc'field, taking the recordID for granted. GTRR(FS) is pretty fast ... but the unknown factor here is how contaminated the found set is, the Omit multiple via GetSummary( even stored in variable requires an extra field as well the summary field, and the scripting requires sorting before even thinking of elimitating dupes. This sorting gets tougher by the number of records. --sd
comment Posted May 4, 2008 Posted May 4, 2008 I didn't say your method used looping - I used looping as a benchmark for testing the speed. My suggestion does use looping, but since it loops between GROUPS of records instead of individual records, it is still very fast. You have track of Bob Weavers test template! Yes I do. Do you? http://www.fmforums.com/forum/showtopic.php?tid/159484/ The 'fast summary' method was the fastest in Bob Weaver's tests. I then suggested another method which was faster, but only when run a second time (after temporary indexing, I presume). AND it requires adding a self-join relationship and a calculation field. I believe switching from flagging and finding (as Bob Weaver did) to omitting directly makes the 'fast summary' method even faster. And all you need to add is a summary field.
Søren Dyhr Posted May 4, 2008 Posted May 4, 2008 (edited) But the test back then didn't include Ugos with GTRR(FS) because the feature first arrived with the next version in august 2005 with ver. 8.0 Yes I do. Do you? No I'll let you be my Rain Man on this! : --sd Edited May 4, 2008 by Guest
comment Posted May 4, 2008 Posted May 4, 2008 But the test back then didn't include Ugos with GTRR(FS) OK, so my question still stands: how fast is it and how much does it cost in terms of resources?
Ugo DI LUCA Posted May 4, 2008 Posted May 4, 2008 (edited) Hello, As you mentionned my method : I would rather use the UniqueList ( ) function, which uses CustomList ( ) here Up to 18700 records in a foundset list, a 2 lines script step will do it Set Variable [ $_emailList ; CustomList ( 1 ; Get ( FoundCount ) ; "GetNthRecord ( Table::email ; [n] )" ) ] Set Variable [$$_emailList ; UniqueList ( $_emailList ; "" ) ] Using Repeating variable, the 18700 limit may be pushed a bit longer. Repeating still, but variably : UniqueList can be found here : http://www.briandunning.com/cf/789 Edited May 4, 2008 by Guest
Søren Dyhr Posted May 4, 2008 Posted May 4, 2008 (edited) sorry Soren, I forgot the buttons here and edited your post !! you were saying ? 2 to 17 x quicker and you had added you method to Bob's file in the attachment below. My 1,000 excuses. FindDupsVariations.zip Edited May 4, 2008 by Guest
Ugo DI LUCA Posted May 4, 2008 Posted May 4, 2008 Then, just wanted to say I was proud to find my name in here. I'm sure using this key for a lot of things. Here, though, getting a list of emails isn't sufficient ? That's why I'd play with a variable that would be used later in a loopscript with the sendmail script step.
comment Posted May 4, 2008 Posted May 4, 2008 I am getting 7 seconds on Bob's fastest (RecNo + Count(SJ)), and 10 to 12 seconds on GTRR(FS). If I can find the time, I will make a comparison with my method later.
Søren Dyhr Posted May 4, 2008 Posted May 4, 2008 I find the same that the GTRR(FS) has it competition alright, but Michael you suggested Fastsummaries which isn't nearly as fast as the search. Now, this isn't in any way a representing set of data - there are way too many dupes in Bob's file compared to a list of e-mail adresses would exhibit, this means that the Fastsummaries method gets closer to any other kind of looping ... it only gets fast if there is far between the breaker values. This turn our bias towards methods involving relational approaches to an extend, no matter how much we're in love with the crafty algorithm of Edoshin's! There is of course some kind of c++'ish looping behind the way requests are build to emulate the relational approach as we tend to grasp it. But these are likely to be much faster than the one we might brew up with scriptmaker. The entire flow in this thread reminds me of a concert with King Crimson, where a certain song included some hefty sequential chord shifts played solo by Adrian Belew, he simply made an error on stage by not getting one of the cords ... the then stopped and laughed and started all over again. Now he's in no way an underacheiver on his guitars, but what relief he suddently laughed and became a mere mortal. Ugo I like the approach of Agnés function, and you would probably then just make the picrows into commas as delimiters - but it screws up the found set idea where more personalized mails are required. --sd
comment Posted May 4, 2008 Posted May 4, 2008 you suggested Fastsummaries which isn't nearly as fast as the search I am not sure what you mean by "the search". So far, a "fast-summary-like" method seems the fastest, both in Bob's file and my preliminary test. I would still prefer it even if it were a close second place, because adding two TO's to the RG just for this is not an enticing prospect. there are way too many dupes in Bob's file compared to a list of e-mail adresses would exhibit That is a very good point, and I will take it into account when building my test.
Agnès Posted May 4, 2008 Posted May 4, 2008 you had added you method to Bob's file in the attachment below. Hello, I am not sure to understand everything for this game, but when I test the 3 buttons, I obtain different results Classic : Find 2060 records, GTRR(FS) too but not the same records and RecNo+Count(SJ) 2059 records, not all the same for the other buttons Without Link or Key, I test this thing : Script : Show All Records Set Field [ FindDups::ElapsedTime; Get ( CurrentTime ) ] Sort Records [ Specified Sort Order: FindDups::AcctNbr; ascending FindDups::Year; descending ] [ Restore; No dialog ] Replace Field Contents [ FindDups::Flag; Replace with calculation: Case ( Get ( RecordNumber ) = 1 ;Let ( $t = FindDups::AcctNbr ; "" ); FindDups::AcctNbr = $t ;1 ; Let ( $t = FindDups::AcctNbr ; "" ) ) ] [ No dialog ] Set Field [ FindDups::ElapsedTime; Get ( CurrentTime ) - FindDups::ElapsedTime ] it is very fast but, sorry if I am out of topic is not ok with v.7 but GTRR(FS) too : Agnès
Ugo DI LUCA Posted May 4, 2008 Posted May 4, 2008 Ugo I like the approach of Agnés function, and you would probably then just make the picrows into commas as delimiters - but it screws up the found set idea where more personalized mails are required. You're right, I didn't considered this case indeed.
Agnès Posted May 4, 2008 Posted May 4, 2008 (edited) Re, I have just tested "Replace Field Contents" in a global, with a calculated variable, and the result seems to me interessant, build an ID'list without change the logs of records I put the file of Søren for better explaining gives, I added 2 scripts, is the script "MarkDuplicatesNew6 _ Replace in globale". Can be something to make with.... Agnès FindDupsVariations_A.fp7.zip Edited May 4, 2008 by Guest
Ugo DI LUCA Posted May 4, 2008 Posted May 4, 2008 it is very fast but, sorry if I am out of topic is not ok with v.7 but GTRR(FS) too : Ok,neat. But I assume the use of replace field command in the examples would not be used in the real life, in order to not change the logs. Waiting for Mike now :
Ugo DI LUCA Posted May 4, 2008 Posted May 4, 2008 without change the logs of records Not fair, you were hiding behind my shoulder ! : Quick, very quick indeed. Thanks. So now we have a global key, so we need a relation again for another GTRR :
comment Posted May 4, 2008 Posted May 4, 2008 Indeed, I am hesitating whether to include any methods that mark records in my test, as these would be extremely problematic in a multi-user situation. More to come, but cannot say when...
LaRetta Posted May 4, 2008 Posted May 4, 2008 Hey Ugo! It's Michael, not Mike. And it's good to see you still repeating and mixing them with variables! One thing I've had time to review in Bob's file is that Fast (Classic) took 2 seconds when first ran. Then 3, then 3, then 2. Then I signed off and back on to repeat test and Fastest (Rec No) took 2 seconds for all 4 attempts. GTRR (FS) took 5, then 4 on remaining 3. I realize that we are probably dealing with .5 seconds and that's why the numbers are different but, when testing several procedures this close, I think it would be important to take the averages since one process (Fastest Rec No) doesn't change at all. Record marking should most certainly be taken into account! Good catch, Michael! So the numbers (on my box) are: Fast ( Classic) actually takes 2.5, Fastest (Rec No) takes 2 and GTRR takes 4.25. I love speed tests; gets my heading buzzin' better than any drug! :laugh2:
comment Posted May 5, 2008 Posted May 5, 2008 Hey Ugo! It's Michael, not Mike. That's correct: http://www.fmforums.com/forum/showpost.php?post/269798/hl/ender+mike+michael/ Thanks for caring, Loretta. :
Søren Dyhr Posted May 5, 2008 Posted May 5, 2008 So now we have a global key, so we need a relation again for another GTRR Excellent point, should indeed be included in the test, if we keep an eye at the initial question, which was to strain a found set of e-mail addresses to prevent sending the same recipient twice. Not that sorting takes much efford, but as it is - doesn't the GTRR(FS) strictly speaking require hardly any sorting at all. But which is the fastest to GTRR(SO) with a primary key in a global or GTRR(FS) with the Ugo-key? Well Agnés has the fastest approach if we measure via Kierens method here: http://www.databasepros.com/FMPro?-DB=resources.fp5&-lay=cgi&-format=list.html&-FIND=+&resource_id=DBPros000822 ....by whopping 4:1 - very nice indeed Agnés! --sd
Ugo DI LUCA Posted May 5, 2008 Posted May 5, 2008 Hey Ugo! It's Michael, not Mike. oops, my bad. I was so invested in finding the "ø" that I mispelled it : Sorry Michael.
LaRetta Posted May 5, 2008 Posted May 5, 2008 Thanks for caring, Loretta. And a good morning to you, too! :wink2:
Søren Dyhr Posted May 5, 2008 Posted May 5, 2008 I have just tested "Replace Field Contents" in a global, with a calculated variable, and the result seems to me interessant, build an ID'list without change the logs of records By the way your Replace calc' is made are you making it harder to get rid of fewer "offenders", but collecting all first occurences - in this case is it a few e-mail addresses that might occure just more than once, while your approach holds when the ration is oppposit as it is with Bob Weavers set of data. So I would suggest the otherwise indeed punchy algorithmic approach of yours gets changed into: Case( Get(RecordNumber) <> 1 and FindDups::AcctNbr = $t ; Let( $l = $l & ¶ & FindDups::RecordID; "" ); Let( $t = FindDups::AcctNbr; "" ) ) I learned this by downloading Dunnings huge set of unique nonsense addresses: http://www.briandunning.com/sample-data/ ...and decided to contaminate the data ever so slightly, what I instead used was this: Case( Get( RecordNumber ) <> 1 and Contacts::FistName = $t and Contacts::LastName = $u and Contacts::ZipCode = $w; Let( $theIDs = $theIDs & ¶ & Contacts::RecordID; "" ); Let( [ $t = Contacts::FistName; $u = Contacts::LastName; $w = Contacts::ZipCode ]; "" ) ) Simply because I would try it against the approach of Sixfried I accidentally referred to in the begining of this thread. --sd
Recommended Posts
This topic is 6047 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now