Jump to content
Server Maintenance This Week. ×

Poll: Auto Enter Serial vs Unique Identifier


This topic is 5049 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I recently started to read up on the practice of using Universally Unique Identifiers (UUIDs or GUIDs) for primary keys instead of the built-in auto enter serial method. I see a lot of merit in using UUIDs, yet I think I still see most people on the forum recommending sticking with FM's auto enter serials. I'm curious to know how many of you out there use UUIDs vs serials.

If you are using UUIDs, can you elaborate on what calculation/CF you use to generate your UUIDs? I'd also be interested to know if anybody can point out any good reasons not to use UUIDs.

Feel free to discuss either method (or any other method) in this thread.

Link to comment
Share on other sites

Hi comment,

In the topic you linked to, are you referring specifically to this post?:

BTW, I don't think a "constructed" or a random UID can be GUARANTEED to be unique - at best, it can reduce the chances of a duplication to "negligible". Of course, negligible is a subjective term.

I haven't examined Ray's uID technique, so let me take a generic example of combining SystemNICAddress, RecordID and CurrentTimestamp. This seems to be a smart combination, since each computer is unique by virtue of its NIC address, and each record - even if created on the same computer within the same second - is unique by virtue of its RecordID.

Scenario:

After a prolonged use, involving the creation of many records AND ALSO DELETION of some of them, the solution is moved to a fresh clone. Old records are imported - but the deleted records mean that the system will now repeat some of the more recent RecordID's. The NIC addresses are of course the same as before, so all that's left is for some naughty employee to turn back the clock on their laptop (perhaps to fool the demo of some downloaded game), and...

I can see how that method of generation could lead to duplicates. However, I am thinking of using a Version 4 type of UUID, which relies on randomly generate hexadecimals:

from http://en.wikipedia.org/wiki/UUID#Version_4_.28random.29

Version 4 (random)

Version 4 UUIDs use a scheme relying only on random numbers. This algorithm sets the version number as well as two reserved bits. All other bits are set using a random or pseudorandom data source. Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx with hexadecimal digits x and hexadecimal digits 8, 9, A, or B for y. e.g. f47ac10b-58cc-4372-a567-0e02b2c3d479.

If I were to follow that exact scheme, there are 30 random hexadecimal digits plus 2 special digits. That's at least 16^30 possible permutations, and an astronomically low chance of generating a duplicate. I think combining this with some digits based on the timestamp would further reduce the chances of a duplicate.

Would you (personally) not trust this form of UUID ?

Link to comment
Share on other sites

Well, "astronomically low" is still larger than zero... And I remain a bit wary of Filemaker's Random function. It's only recently been fixed, and we still don't know how it's seeded (and have no control over it).

Anyway, my main point is further down in that thread, namely: why bother with a UID, unless you really need it for some special purpose?

Link to comment
Share on other sites

Well, "astronomically low" is still larger than zero... And I remain a bit wary of Filemaker's Random function. It's only recently been fixed, and we still don't know how it's seeded (and have no control over it).

Point taken. And I guess, along the lines of what you said in the other post, the definition of 'negligible risk' is subjective. The obscurity of FM's random function is a little bit of a worry, even though it seems to work properly now.

Anyway, my main point is further down in that thread, namely: why bother with a UID, unless you really need it for some special purpose?

Well, I came to read about UUIDs as I'm working on a system that needs to be distributed and synced. And I will probably have to work on a few more systems like this in the future. For these particular cases, I'm already 99.9% convinced that I want to use UUIDs (as opposed to concatenating nodeIDs etc).

But I also see merit in using it in all my other solutions. I know there are at least a few FM developers who use UUIDs (though none of them seem to have registered in this poll yet). I guess the biggest drawcard to me is the idea that I can restructure a db and/or import data without needing to worry about handling the serials.

Anyway, I don't want to drag you into a lot of back-and-forth on the topic. I understand your points (including what you've said in other threads) and I am taking them into consideration. Thanks for your input. If I decide to go ahead with UUIDs and that decision ends up causing me problems, I'll post back here so you can say "I told you so!" :

I guess what I'd be most interested to know is: For those (apparently few) people using UUIDs in FM, have you come across any real downsides or snags (eg. a dupe) that might make one reconsider using them?

I'm leaning towards using a combination of timestamp, MAC address, and a series of random digits. All encoded in Base 36 to try to keep the length down to something reasonable (around 24 char). I believe the use of the (almost) unique combination of timestamp and MAC address, combined with the random digits (probably ten Base 36 digits) will satisfy my need for a key that is close enough to being both random and unique.

Link to comment
Share on other sites

If I decide to go ahead with UUIDs and that decision ends up causing me problems, I'll post back here so you can say "I told you so!"

:

I will add one point, though: a MAC address is supposed to be unique; as long as your nodes do not exchange hardware, it uniquely identifies the node. So basically you are debating:

NodeID & RandomID

vs.

NodeID & SerialID

Link to comment
Share on other sites

Well, I came to read about UUIDs as I'm working on a system that needs to be distributed and synced. And I will probably have to work on a few more systems like this in the future. For these particular cases, I'm already 99.9% convinced that I want to use UUIDs (as opposed to concatenating nodeIDs etc).

.....

I'm leaning towards using a combination of timestamp, MAC address, and a series of random digits. All encoded in Base 36 to try to keep the length down to something reasonable (around 24 char). I believe the use of the (almost) unique combination of timestamp and MAC address, combined with the random digits (probably ten Base 36 digits) will satisfy my need for a key that is close enough to being both random and unique.

I am probably missing some subtleties here, but, if you have an initializion routine on each copy that sets serials in each table to be prefixed with some value unique to that copy of the program, wouldn't you have achieved the same thing?

Copy #1 has serials:

Fred1, Fred2...

Copy 32 has serials

Jim1, Jim2...

Then all you would really have to worry about is:

(a) The program knows it is a new copy when it opens the first time

(: You have a method of generating a unique prefix for that copy. Having some form of user registration on opening in combination with a date stamp would suffice I would think.

Bringing records together from multiple copies would be possible. There would be no extra overhead on the solution generating keys.

You can initialize a copy that already has records without changing any of the existing records, so in the event that some pre-population of the database is required, it does not become an issue.

I used this on my CRM program to give the provision for multiple runtime users who would want to synch up later. I don't think it has ever been used so I won't be pushing it forward in the next rev. But I did take a pretty good look at it at the time and I looked at it again when a client wanted to share data with multiple runtimes in various locations, effectively sending strings of data backand forth and reconstituting them. In both cases it seems that the real deal instead of a runtime + WAN was a better option.

Just some thoughts

Link to comment
Share on other sites

So basically you are debating:

NodeID & RandomID

vs.

NodeID & SerialID

Yup, that's really the bones of it. Whether to use a nodeID or the MAC is almost an arbitrary decision (since the MAC identifies a node), though I can apply the MAC-based key form to all other solutions, not just those that are distributed nodes.

There's also the off-chance that a nodeID gets set incorrectly (or mistakenly), causing records to be mis-identified in the central db. But this needs to be checked and properly handled by the program, regardless of the key system being used, so this shouldn't really be a factor in the decision.

Link to comment
Share on other sites

I am probably missing some subtleties here, but, if you have an initializion routine on each copy that sets serials in each table to be prefixed with some value unique to that copy of the program, wouldn't you have achieved the same thing?

Essentially, yes, and I initially thought that simply using a unique prefix & serial would work perfectly well when syncing distributed dbs. It was only when reading something in the SyncDek documentation that I came across a brief mention of a complication that can arise. When you come to update the db to a new version, presumably you would take a copy of the central db (making sure all data has been synced), import the data into your new version, then duplicate that, set the node-specific details in each copy, and then distribute the copies to the nodes. However, each node would need different 'Next Serial' values in each of the tables, because different numbers of records have been created and deleted at each node. SyncDek recommends prefixing a db version code to the key (as well as nodeID), so that all new keys will be unique to the new version, therefor never conflicting with keys that were created in a previous version (even if a duplicate serial is used). It's a simple enough fix when you're aware of the potential problem, but I like to try to make things as automatic and generic as possible. I've also never liked the idea of having to reset serial increments when importing/updating, even in non-distributed solutions. Call me lazy :

Link to comment
Share on other sites

You can automate key setting for next serial number. I would have a calc that was text to number of the serial. Basically, find the local prefix. If there are any records, sort by the calc field. Go to last record. Set the next serial to Prefix & Last serial +1.If there were no records, just set to prefix & 1

Alternately, yes part of any prefix is the version. Then any new version can start back at Prefix & 1

No you do not want to be going in and hand bombing it, expecially if you are doing this over multiple tables.

Link to comment
Share on other sites

You can automate key setting for next serial number. I would have a calc that was text to number of the serial. Basically, find the local prefix. If there are any records, sort by the calc field. Go to last record. Set the next serial to Prefix & Last serial +1.If there were no records, just set to prefix & 1

Yeah, that's pretty much the standard scripting routine I use for updates in my current solutions. But it's not so easy to make this 'generic.' You need to sort the records, and I don't know of a way to do something like "Sort by field ($pkField)", which means you need to hard code the sort steps for every table. I actually use a couple of CFs to determine the largest number without needing to sort, and then with the help of consistently named pk fields, I can just about make a generic loop that will set the next serial values for all tables in the db. Still... I like the idea of not having to do this at all.

Link to comment
Share on other sites

Unsort?

Effectively sorting by creation order? But you wouldn't assume that the last record necessarily has the highest pk, would you? I mean, it might be a safe assumption in a lot of (probably most) cases, but it's not a generic assumption I would apply across the board. Would you? Imported records would throw off the reliability of this method, wouldn't they?

Link to comment
Share on other sites

I suppose it is a safe assumption, and a clever idea. So simple, yet I'd never thought of doing it that way. (Learn something everyday). Although I can think of situations that could break this, I think all of those situations should not be allowed to happen in a properly controlled and designed system, anyway.

Link to comment
Share on other sites

  • 1 month later...

I'm using Scodigo's PHP plugin to return the result of an "echo com_create_guid();" command which gets a GUID from Windows.

I've tested it with a 7m record table with no duplicates and there's a negligable performance impact.

There is the added cost of the plugin though...

Link to comment
Share on other sites

  • 4 months later...

Just thought I'd add to my post above, that I've found a free way to generate a GUID in Filemaker. You'll need Visual Studio Express edition (Or similar) for this.

Download and install the DotNet2FM plugin from: http://www.dotnet2fm.com/ (A donation to the author would probably be appreciated).

Follow the instructions/tutorials for the plugin to create a new class library function that returns (The code has been butchered a bit by me in Notepad to remove unrelated code, so apologies for the wrapping) :


using System;

using System.Collections.Generic;

using System.Text;

using System.IO;

using System.Xml;

using DotNet2FMInterface;

using System.Net;



namespace fmConnector

{



    public class fm : IPlugin

    {



        public fm() { }



        public string PluginID

        {

            get { return "fmConnector"; }

        }

        public short MethodBase

        {

            get { return 10000; }

        }

        public bool CanIdle

        {

            get { return false; }

        }

        public bool CanPreferences

        {

            get { return false; }

        }

        public void Init(IEnvironment environment) { }

        public void Shutdown(IEnvironment environment) { }

        public void Idle(IEnvironment environment) { }

        public void Preferences() { }

        public enum Methods

        {

            GUID,

            MethodCount

        }

        public short MethodCount

        {

            get { return (short)Methods.MethodCount; }

        }



        public IMethodInfo GetMethodInfo(short nMethodIndex)

        {



            switch ((Methods)nMethodIndex)

            {

                case Methods.GUID:

                    return new MethodInfo("GUID", "GUID()", 1, 1, MethodFlags.MayEvaluateOnServer | MethodFlags.DisplayInAllDialogs);

            }

            return null;

        }



        public bool InvokeMethod(short nMethodIndex, IEnvironment environment, IDataVector parms, IData result)

        {



            switch ((Methods)nMethodIndex)

            {

                case Methods.GUID:

                    return Guid(parms[0], result);

            }

            return false;

        }



        // Guid Method

        private bool Guid(IData name, IData result)

        {

            result.AsText = System.Guid.NewGuid().ToString();

            return true;

        }



    }



}

Compile the library and drop the DLL into the FilemakerDotNet2FM Plugins folder.

You should now see the function available in Filemaker that will return a GUID :

Edited by Guest
Link to comment
Share on other sites

This topic is 5049 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.