MacFileman Posted October 28, 2018 Posted October 28, 2018 Hey... I have a database with 60,000 plus names and addresses. I need to mail to 40,000 exactly. I can do this manually...but I was wondering if there was a way to have FM extract 40,00 exact addresses randomly. Thanks in advance. Mike
comment Posted October 28, 2018 Posted October 28, 2018 (edited) 2 hours ago, MacFileman said: I can do this manually.. If it can be done manually, then surely it can be also scripted? Say something like (pseudocode): Go to Layout [ Target table ] Show All Records Loop Exit Loop If [ Get ( FoundCount ) ≤ 40000 ] Go to Record [ Int ( Random * Get ( FoundCount ) ) + 1 ] Omit Record End Loop Note that this could be speeded up by omitting chunks of records at a time instead of just one. However, you haven't told us what the purpose of this exercise is and we also don't know how your records were created over time; ostensibly, this could have a side effect of reducing the "randomness" of the selection. Alternatively, define a calculation field (result is Number) = Random then sort the records by this field and use the Omit Multiple Records script step to get the found set to the desired size. If you need to do this periodically, then make this field unstored in order to get a different set chosen randomly every time. Edited October 28, 2018 by comment 1
MacFileman Posted October 29, 2018 Author Posted October 29, 2018 Thanks for the reply. As I stated, this is just for a mailing. They have 60K+ addresses, only have a budget to mail 40K postcards. They had asked me to remove random addresses to get to 40,000, but I figured FM can do this automatically. It is just a list of given to me with name, address, city, stat and zip in excel. Plan to pull into filemaker and randomly extract 40,000. Thank you for your fast feedback. Looking at this later today.
comment Posted October 29, 2018 Posted October 29, 2018 (edited) 1 hour ago, MacFileman said: They had asked me to remove random addresses to get to 40,000 I am not sure you understand my concern here. The simplest method to to reduce 60k+ records to 40k is to cut off the first (or the last) 20k+. "First" could mean first in creation order, or in alphabetical order, or in any other order that the records can be sorted. But you say you want to remove random records - and you don't say why. Why does this matter? Because "random" can be costly to implement - esp. if you want to implement it perfectly. Take the very first method I have suggested: it requires no changes to the file's schema, all the logic is contained in the one script. It is also perfectly random (or at least as perfectly random as you can get using a computer). However, omitting 20k+ records one-by-one will take some time. How much time? I don't know, and I am not going to try and find out. But if you find out it takes too long, you will have to either add more resources to the file or reduce the level of randomness of the selection. And the question whether reducing the level of randomness is permissible or not, and if yes, by what degree, cannot be answered without knowing what is the reason for the original requirement of randomness. Edited October 29, 2018 by comment
MacFileman Posted October 29, 2018 Author Posted October 29, 2018 Ah....obviously the easiest way is to remove the last 20,000 etc. or the first 20,000. The database is over the course of 20 or so town in a county in NJ. Doing so would a remove an entire town from the list. So we wanted to randomize the removal.
comment Posted October 29, 2018 Posted October 29, 2018 And what's wrong with removing an entire town?
OlgerDiekstra Posted October 30, 2018 Posted October 30, 2018 Then the safest way to get a random 40.000 is to do it per town. Otherwise you run the risk of excluding towns of which you only have a few addresses. Create totals of addresses in each town, then turn those totals into a percentage of the total addresses, and next calculate how many addresses each town should get based on that percentage out of 40.000. Then you create a random list for each town which should ensure you have addresses from every town. This topic https://community.filemaker.com/thread/79123 discusses how to randomly remove records from a found set until you have the amount you need. Caution, this may not be very quick.
comment Posted October 30, 2018 Posted October 30, 2018 17 minutes ago, OlgerDiekstra said: Then the safest way to get a random 40.000 is to do it per town. What do you mean by "then"? Do you have some additional information that would warrant such conclusion? The only thing we were told so far is that there is a requirement to select the records "at random". And I would say that selecting records per town is less "random" than selecting them from the entire population - perhaps significantly so. So the question remains: what is the real purpose here? Until we have an answer to that, we won't be able to say what is the right way to proceed. 28 minutes ago, OlgerDiekstra said: This topic https://community.filemaker.com/thread/79123 discusses how to randomly remove records from a found set until you have the amount you need. Hm, I thought I have already covered that in my first post in this thread.
OlgerDiekstra Posted October 30, 2018 Posted October 30, 2018 1 hour ago, comment said: What do you mean by "then"? Do you have some additional information that would warrant such conclusion? The only thing we were told so far is that there is a requirement to select the records "at random". And I would say that selecting records per town is less "random" than selecting them from the entire population - perhaps significantly so. MacFileman's comments: The database is over the course of 20 or so town in a county in NJ. Doing so would a remove an entire town from the list. So we wanted to randomize the removal. So to prevent a town with, say, only 2 addresses listed not being included in the random list, the approach of creating a random list of addresses per town would ensure every town has at least a few addresses included. So throughout the thread, another requirement became apparent. Every town must be included. 1 hour ago, comment said: So the question remains: what is the real purpose here? Until we have an answer to that, we won't be able to say what is the right way to proceed. Hm, I thought I have already covered that in my first post in this thread. Huh, so you did. 😄
comment Posted October 30, 2018 Posted October 30, 2018 (edited) 1 hour ago, OlgerDiekstra said: So throughout the thread, another requirement became apparent. Every town must be included. I don't see that as a requirement. All he said was that removing the first 20k records could remove an entire - presumably large - town from the list (which I think indicates that the list is sorted by town as default). That doesn't mean that every town, no matter how small, must be represented, even at the cost of skewing the overall statistics. Heck, it doesn't even mean that every large town must be represented. For all I know, it could mean nothing at all. That's why I am struggling so hard to get him to tell us what they're really trying to achieve here. The objection to removing an entire town makes no sense, unless it is followed by "because that would defeat our goal of ... ???". Of course, it's entirely possible - even likely - that they simply haven't thought this through in detail, and that they only have a vague notion that a random selection would be best - if only because no better-suited criterion comes to mind. I am going to go out on a limb and make a prediction here: I predict that there is no statistical significance to the name of the street in which a person lives, or to the number of their house. Therefore, if we sort the records by the street name or by the house number (regardless of the town), and drop the first or last 20k+ records from the list, we will have solved OP's problem in an entirely satisfactory manner. Edited October 30, 2018 by comment
OlgerDiekstra Posted October 30, 2018 Posted October 30, 2018 Ah, you may not, but MacFileman, or rather 'They', as in MacFileman's client, may well believe that it is, and they may be right. But even if they're not, they will still have to travel the path until they come to the conclusion that it is not. We don't know a lot of things, and more often than not, we never will know everything. But that's how life works. But you are right that we need to provoke and stimulate the thinking process.
Recommended Posts
This topic is 2272 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now