Jump to content

Unique Random Numbers


This topic is 3436 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Why don't you simply use an auto-generated serial number as the ID? There are only 9,000 numbers that are 4 digits long - hardly enough to distribute randomly among hundreds(?) of students without duplication. Of course, you could devise a mechanism to generate a new number instead of a duplicate - but eventually you will distribute all or almost all of the available numbers, same as if you handed them out in order to begin with.

 

 

 

I know of the Int("0000", Random( *10).

 

You know more than I do ... :no:

Link to comment
Share on other sites

There is a  Get UUID function which generated a random ID, but it's alphanumeric and 36(?) characters long.

Four characters doesn't seem to give enough possible random IDs.

Link to comment
Share on other sites

I believe the answer was clearly given by Comment here:  http://fmforums.com/forum/topic/94199-unique-random-numbers/?p=430764

 

Why not use a regular auto-enter serial incrementing 1?  I show you rate yourself beginning so I'll explain in the complex process below:

 

  1. Create a field called StudentID or simply ID (result is number).  There should be a unique key in every table.
  2. Go to Options and at the auto-enter tab, select serial increment starting at 1, increment by 1.

Done.   :laugh2:

 

Anyway, if there is some reason you can't use this method, please explain and we'll help you through it. 


BTW, you can use increment serial as GisMo and Comment suggests or you can use UUID, as indicated by Mike KD.

Link to comment
Share on other sites

Using SerialIncrement ( "0000" ; Random * 10000 ), I created only 156 records and already hit a duplicate.

 

There's actually a way to compute the odds of getting a duplicate and they are much higher than most people would guesstimate; this is known as the birthday paradox.

Link to comment
Share on other sites

Oh wow.  This is very cool.  "However, 99.9% probability is reached with just 70 people, and 50% probability with 23 people."

 

We (at least I) assume that, when dealing with probabilities, a 'one-out-of-a-hundred' chance (for example) would be good odds but this (and my test) shows that ANY odds are bad (or good) odds, depending upon your perspective.  And, as with FileMaker, everything is perspective.  

 

Thank you for pointing out that link since *logical mathematics is fascinating.  I'm no mathematician that's for sure but I DID look at your 9,000 and begin to reach for Combination() to figure out how you computed it (or whether that was even a function to use for it).  Instead I was lazy and used loop script to create new records with validation unique and waited for it to fail.   :jester: 

 

The issue I have with some of these concepts is that proving/disproving can be difficult because it is not always clear if your logic and approach is correct since there is no answer book from which to compare and the problem with my method is that, if projecting the number of stars in the universe, FM would still be looping after our sun finally died!  

 

 

*I suppose "logical mathematics" is a bit tautological but you get my drift, I'm sure.


I should have clarified ... I read about Birthday Paradox here: http://en.wikipedia.org/wiki/Birthday_problem


I'll play with Combination() anyway.  I haven't looked at that function for probably 6 years and it isn't used very often.

Link to comment
Share on other sites

As others have explained, using an auto-entered serial number is much easier than creating random fixed-length IDs. It also avoids an inevitable problem with fixed-length IDs where any long-lived system will eventually run out.

 

However, if it's really what you want, there is a way to get such random IDs without creating any duplicates or doing computationally expensive rejection sampling. First, create a list of all the valid IDs in order: 1000, 1001, 1002, ... , 9998, 9999. Second, shuffle the values into a random order. If your list is return-delimited, you could use the ValueShuffle custom function. Otherwise, you could still use the logic as a guideline for shuffling values stored some other way. Third, use the IDs from the shuffled list in order.

Link to comment
Share on other sites

Instead I was lazy and used loop script to create new records with validation unique and waited for it to fail.

That's a fine method too, provided you implement it correctly. If you want to ascertain odds by experiment, you must repeat the experiment a significant amount of times.

 

For example, generate 100 records and see if you get a duplicate. Repeat 100 times. Then divide the number of experiments that did produce a duplicate by 100 and that's your percentage. If it agrees with the calculated odds of ~40%, then (most likely) both methodologies were correct.

Link to comment
Share on other sites

Otherwise, you could still use the logic as a guideline for shuffling values stored some other way. Third, use the IDs from the shuffled list in order.

 

This is a good option.  I wonder how this would compare to just creating 9,999 records in a serial table (it can even be pre-padded with zero) and then finding a record at random, setting a field to 'claim' the record's ID, eliminating potential of record lock.  These IDs can be indexed ... I'm just thinking out loud about options.  

 

Great discussion guys!  

 

It needs to be 4 digits long.  This number is going to be a unique ID for 6 7 and 8th grade students. 

 

Why only 4 digits?  Is there a government form which only allows 4 slots or does it have to match some other system?  You can also use a different field to hold the other system's values - no problem - but 'unique ID for 6 7 and 8th grade students' ... which will be used to hold your relationships together, should be meaningless, auto-enter IDs.

  • Like 1
Link to comment
Share on other sites

I wonder how this would compare to just creating 9,999 records in a serial table (it can even be pre-padded with zero) and then finding a record at random, setting a field to 'claim' the record's ID, eliminating potential of record lock. 

 

That is exactly what I would do, if this were about allocating assets at random - such as raffle tickets or parking lots. For meaningless (asset-less) IDs, I would just loop-generate random numbers until a unique one was found (provided, of course, that the allowed range was large enough in proportion to the expected population).

Link to comment
Share on other sites

I wanted to provide Luis a closer estimate for risk of duplicate and I also wanted to know myself using experiment method (I'm still working on the other), so I created the following script which tells me repeatedly that I have 38-42 fails (a fail being a group of 100 records having EVEN ONE duplicate within its loop).  It was interesting that the actual number of duplicate serials was between 55-60.

 

For example, generate 100 records and see if you get a duplicate. Repeat 100 times. Then divide the number of experiments that did produce a duplicate by 100 and that's your percentage. If it agrees with the calculated odds of ~40%, then (most likely) both methodologies were correct.

 

I believe that the logic behind my script is dependable but I will post it here in case others find this stuff interesting also or if someone spots an error in my thinking - I'd love to know about it.

# Testing predictability on 'serial' field, auto-enter of: SerialIncrement ( "0000" ; Random * 10000 )
#
# Create 100 records in loop B within 100 loops in loop A = 10,000 records total
Loop
  # - - - - - A) Repeat 100 times.
  Exit Loop If [ 
    Let ( [
    $loops = $loops + 1 ;
    $countDups = "" ;
    $i = ""
    ] ;
    $loops ≥ 100
  ) ]
#
#
  Loop
  # - - - - -  Generate 100 records and see if you get a duplicate.
    Exit Loop If [ Let ( $i = $i + 1 ; $i ≥ 100 ) ]
    New Record/Request
    Set Variable [ $countDups; Value:$countDups +
      ExecuteSQL (
      " SELECT COUNT (*)
      FROM data
      WHERE serial = ?"
      ; "" ; "" ; data::serial ) -1 ]
    End Loop
  #
  Set Variable [ $totalDups; Value:$totalDups + $countDups ]
  Set Variable [ $totalFails; Value:$totalFails + GetAsBoolean ( $countDups ) ]
  #
  Delete All Records [ No dialog ]
End Loop
#
# - - - - - Then divide the number of experiments that did produce a duplicate by 100 and that's your percentage.
Set Variable [ $percent; $totalFails / 100 ]
#
# If it agrees with the calculated odds of ~40%, then (most likely) both methodologies were correct.
#

50-60 duplicates, meaning 100-120 student records out of 10,000, is huge problem!  When a relationship depends upon unique ID, even a SINGLE duplication cannot be tolerated but we sometimes lie to ourselves and believe the odds are too great to worry about.  They aren't.   :crazy2:

Link to comment
Share on other sites

I've only glanced at it, but it looks fine to me - and the fact that the result so closely matches the prediction does a credit to the law of large numbers... However:

 

50-60 duplicates, meaning 100-120 student records out of 10,000

 

Oh, no. That's not how it works. The odds of getting a duplicate increase with each new record created. You have only tested the number of duplicates within the first 100 records. If you generate 10,000 records in a row, the number of duplicates (and triplicates, quadruplicates, etc.) will be MUCH higher than that.

Link to comment
Share on other sites

Ah! You mean because I deleted the 100-record set between each experiment? I did that so the ExecuteSQL() would work, LOL. I suppose our worst enemy in working through the logic of things is Self. :-) When I use a self-join based upon data::serial = data 2 ::serial, to count the duplicates instead of ExecuteSQL(), it jumps exponentially and THIS is what you meant and it also fits more closely with my first test!!

 

Luis, if you eventually use all 10,000 IDs, approx. 3,100 duplicate IDs will be generated, affecting (up to) 6,200 student records in a negative way. That is TERRIBLE odds!! What an intriguing subject!

 

Added words '(up to)'

Edited by LaRetta
Link to comment
Share on other sites

 affecting (up to) 6,200 student records in a negative way.

 

Yes, the "up to" is important, because of triplicates and higher. Although I think you meant to write 6,900 (10,000 - 3,100)?

 

 

Once again, we can predict the results statistically. Using the binomial distribution function, we can calculate how many of the 10,000 random numbers will be assigned to exactly one student. That probability is given by  =

BinomDist ( 1 ; 10000 ; 0.0001 ; 0 ) = 0.3678978362165516

So there will be, on average, 3,679 students with a unique ID and all the rest of them - 6,321 students - will have a non-unique ID (duplicate or higher).

 

 

 

What an intriguing subject!

 

You think?

Link to comment
Share on other sites

So there will be, on average, 3,679 students with a unique ID and all the rest of them - 6,321 students - will have a non-unique ID (duplicate or higher).

 

You pointed out the issue more clearly for Luis than I did!!  It is a HUGE  issue!  Your custom function was exactly what I was trying to work through!!  

 

And yes, I think it is a fascinating subject!  Maybe I need to get out more.  :idot:

Link to comment
Share on other sites

This topic is 3436 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.