Jump to content

This topic is 6258 days old. Please don't post here. Open a new topic instead.

Recommended Posts

  • Newbies
Posted

I have a requirement to randomly mark a record as it is created during a month and end up at the end of the month with a minimum of 10% of the records marked.

Using: Int(10*random) and then marking a record if the result is 5 (could use any number 0-9)is close. For 10,000 records you may get 978 one month and 1014 the next month.

Is there a formula that can be used to insure that minimum 10% requirement is met each month?

Posted

This is not going to be possible in general, a bell-curve around 10% is often going to be under it (about half the time). You could move your attempted percentage up to 12% or higher to reduce the amount it happens drastically though.

I believe:

    (random <= 0.12)

should return true about 12% of the time, btw.

Posted

I agree with Shawn that it's theoretically impossible to satisfy both requirements (although I don't quite see how a bell curve figures in this). The best you can do with the current method is to raise your chances of success (i.e. the level of confidence) - but there will always be some months falling under the minimum required 10%. The only way to eliminate them completely is to mark ALL records.

It would be different if the sampling process were moved to the end of each month: then you could pick random records until you have a sample size of 10% of the selected group. But this would, to some extent, skew the overall "randomness" of the marked records in general.

Posted

If you need to guarantee the 10% limit you'll have to analyze all records each time. That is I'd try to implement the following algorithm:

- See how many marked and unmarked records are there.

- See what happens if I don't mark the new record. If it makes less than 10% of this month's records marked, I have to mark it. I don't know the future, so I need to assume this may be the last record for this month and play safe. This means the first record of every month is always marked.

- If I don't have to mark this record, then I mark it randomly. Since the mandatory marking rule will catch up, if necessary, I can use any distribution, e.g. 1% or 50%. But different distribution will affects the pattern, e.g. with 1% marked most marks will be placed by the mandatory rule, so they will probably run in rows. So it's better to have the distribution also around 10%; in this case only few marks will be forced.

- This doesn't account for deleted records.

- It would be much simpler to randomly select 10% of records at the end of a month. Especially if I need to handle deleted records as well. And it will be way faster, as I won't have to analyze the whole month every time. So I'd try to convince the customer to change their mind.

  • Newbies
Posted

Thanks for the replies .... actually your first suggestion to use random <= .12 was what worked best (and is the best fit to the real application) because physical activity has to take place if a record is selected (quality control takes place immediately) AND they can live very comfortably with a percent closer to 12% than coming in under 10%. The simple solution never occurred to me .... thanks.

Posted

I agree with Shawn that it's theoretically impossible to satisfy both requirements (although I don't quite see how a bell curve figures in this).

What I meant was if you just pick your random 10% in the simple way, some months you would have 9.8%, some 10.7%, etc. If you plotted those points over a long number of months, I would bet you would see a normal distribution (bell) curve around the 10% point.

This topic is 6258 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.