Newbies sbcne Posted January 9, 2008 Newbies Posted January 9, 2008 I have a requirement to randomly mark a record as it is created during a month and end up at the end of the month with a minimum of 10% of the records marked. Using: Int(10*random) and then marking a record if the result is 5 (could use any number 0-9)is close. For 10,000 records you may get 978 one month and 1014 the next month. Is there a formula that can be used to insure that minimum 10% requirement is met each month?
The Shadow Posted January 9, 2008 Posted January 9, 2008 This is not going to be possible in general, a bell-curve around 10% is often going to be under it (about half the time). You could move your attempted percentage up to 12% or higher to reduce the amount it happens drastically though. I believe: (random <= 0.12) should return true about 12% of the time, btw.
comment Posted January 9, 2008 Posted January 9, 2008 I agree with Shawn that it's theoretically impossible to satisfy both requirements (although I don't quite see how a bell curve figures in this). The best you can do with the current method is to raise your chances of success (i.e. the level of confidence) - but there will always be some months falling under the minimum required 10%. The only way to eliminate them completely is to mark ALL records. It would be different if the sampling process were moved to the end of each month: then you could pick random records until you have a sample size of 10% of the selected group. But this would, to some extent, skew the overall "randomness" of the marked records in general.
Mikhail Edoshin Posted January 9, 2008 Posted January 9, 2008 If you need to guarantee the 10% limit you'll have to analyze all records each time. That is I'd try to implement the following algorithm: - See how many marked and unmarked records are there. - See what happens if I don't mark the new record. If it makes less than 10% of this month's records marked, I have to mark it. I don't know the future, so I need to assume this may be the last record for this month and play safe. This means the first record of every month is always marked. - If I don't have to mark this record, then I mark it randomly. Since the mandatory marking rule will catch up, if necessary, I can use any distribution, e.g. 1% or 50%. But different distribution will affects the pattern, e.g. with 1% marked most marks will be placed by the mandatory rule, so they will probably run in rows. So it's better to have the distribution also around 10%; in this case only few marks will be forced. - This doesn't account for deleted records. - It would be much simpler to randomly select 10% of records at the end of a month. Especially if I need to handle deleted records as well. And it will be way faster, as I won't have to analyze the whole month every time. So I'd try to convince the customer to change their mind.
Newbies sbcne Posted January 9, 2008 Author Newbies Posted January 9, 2008 Thanks for the replies .... actually your first suggestion to use random <= .12 was what worked best (and is the best fit to the real application) because physical activity has to take place if a record is selected (quality control takes place immediately) AND they can live very comfortably with a percent closer to 12% than coming in under 10%. The simple solution never occurred to me .... thanks.
The Shadow Posted January 10, 2008 Posted January 10, 2008 I agree with Shawn that it's theoretically impossible to satisfy both requirements (although I don't quite see how a bell curve figures in this). What I meant was if you just pick your random 10% in the simple way, some months you would have 9.8%, some 10.7%, etc. If you plotted those points over a long number of months, I would bet you would see a normal distribution (bell) curve around the 10% point.
The Shadow Posted January 10, 2008 Posted January 10, 2008 Yes, that's more correct, the discrete version makes sense in this case.
Recommended Posts
This topic is 6508 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now