Post Code Validation

milanm · March 3, 2014

Hi,

i have a survey database used to collect client personal data including post codes in UK (United Kingdom).

It appears that almost 50% of the time people are getting these postcodes wrong and return invalid data, so they need to be validated somehow.

British Royal Mail issues a data file with over 26 000 000 existing postcodes in UK and i need to validate the postcode field against these 26 000 000 existing records, each time it is populated.

My first thought is to create a table in a separate file with 26 000 000 (more or less) postcode records and make a value list that would contain the records from that table and that field. Than i would make the postcode field that the employees are populating a member of this value list (with 26 000 000 values) which would in theory make it non-acceptable for any other value than one of these correct postcodes.

As i never had a chance to manage such a big piece of data in a FileMaker interaction, i am wondering would it work, is it possible at all to create a value list as big, and would it take a long time.

This project is still on FileMaker Server 10 Advanced running on Windows 2008 server. There are about 40 employees working on the server with no issues so far. Most of the users are working on LAN and this postcode validation will be only for LAN users.

I will give it a try, but any clue or hint would be greatly appreciated.

Thank you

milanm · March 3, 2014

Sorry, i got the wrong info in regards to above - it actually seems to be "only" 1 800 000 post codes in UK area.

comment · March 3, 2014

Wouldn't it be sufficient to validate the format of the post code?

http://en.wikipedia.org/wiki/UK_postcodes#Validation
http://fmforums.com/forum/topic/30104-uk-postcode-validation-field/?p=348984

milanm · March 4, 2014

Not in UK, unfortunatilly, it is not allways the same. Also, some of the codes could be in the right format but still do not have to be existing postcodes (in UK)...

However, i made it in a way i mentioned above and it works ok localy, now i placed the file on the server and testing it there. It would take some time to index these new 2 000 000 records, and from there hopefully it should work just fine.

comment · March 4, 2014

Not in UK, unfortunatilly, it is not allways the same.

What do you mean, not in UK? The two links are dealing specifically with UK postal codes.

Also, some of the codes could be in the right format but still do not have to be existing postcodes

That's true. And some of them could be in the right format and be an existing postcode - and still be wrong. In your place I would ask what is the statistical distribution of such errors. Because the effort required to handle each type is vastly different.

milanm · March 4, 2014

Sorry, what i ment is the UK postcodes are not always in the same format, some may have 3 caracters - space - 3 caracters, while some might have 4 caracters - space - 3 caracters and so on...

The validation proceedure that i implemented takes a lot of time on the server but the same validation against the same postcode file on the local drive is instant.

So, here is how i made it work at the end:

1) Made a separate file with only 1 field (::postcode) containing arround 2 000 000 postcodes which i got from British Roayal Mail, these are all existing postcodes in UK,Scotland, Wales and Islands (133 MB).

2) Created a value list that contains all the values from the field where the postcodes reside (::postcode).

3) PLACED THIS POSTCODE FILE ON EACH PC IN THE OFFICE THAT NEEDS IT (about 20 of them) in the root of the C: drive

4) CREATED A CONNECTION ON EACH WORKING COMPUTER TO THIS FILE.

5) Created a value list inside the working file (which is on the server) that TAKES VALUES FROM THE CREATED EXTERNAL FILE VALUE LIST.

6) Set up the postcode field in the working file to be a member of this value list created on the external file.

And huh, now it works finally, if they import the non existing postcode it instantly pops up with an error message and asks them to revert the field. If they enter the existing postcode it does nothing, just as supposed.

I would like it better if i could have had the postcode file on the server but it just takes to much time (like 5 - 10 minutes) while this way it take NO time (happens strait away).

Thanks for the support "Comment" i appreciate it.

comment · March 4, 2014

Sorry, what i ment is the UK postcodes are not always in the same format, some may have 3 caracters - space - 3 caracters, while some might have 4 caracters - space - 3 caracters and so on...

Yes, I know that. You obviously haven't looked at the thread I linked to.

Rick Whitelaw · March 4, 2014

This thread makes me wonder if there's an algorithm to check possible postal codes much like Luhn's calculation can test the validity of credit card numbers and Canadian Social Insurance Numbers.

David Jondreau · March 4, 2014

This thread makes me wonder if people actually read the responses in it.

Rick Whitelaw · March 4, 2014

This thread makes me wonder if people actually read the responses in it.

Well, David, I HAVE read the previous posts and followed the links. Your wording leads me to believe your comment was directed at me. I wasn't referring to format checking. Luhn's calculation (to which I referred) will, in some data sets, verify whether or not it's possible for a number to have been issued. The Canadian Social Insurance Number is one such data set. As a nine digit number you'd figure it has one less than a billion possibilities. I tested from 200,000,000 to 299,999,999. It turns out that around 10% of the numbers could ever have been issued. This doesn't mean a number WAS issued. However a number that could never have been issued will raise a flag in my payroll system. And yes, it has happened. The calculation also works with Canadian Business Numbers and major credit card companies' card numbers.

I'm not familiar with UK postal codes and don't know if there are restrictions in terms of content. I say this because it's obvious that the issuer of a given code must also subscribe to the same validation scheme before issuing the code as the scheme that will validate it post-issue.

Rick.

comment · March 4, 2014

Luhn's calculation (to which I referred) will, in some data sets, verify whether or not it's possible for a number to have been issued.

In more general terms, you are referring to a check digit. The Luhn formula is just one possible implementation of this feature. The UK postal code does not include a check digit, therefore any combination that conforms to the format COULD have been issued.

Note that a check digit is (intentionally) redundant - it does not carry any real information. The actual payload is contained in the other digits. So naturally, with a 9-digit number there will be only 10^8 valid combinations.

Rick Whitelaw · March 4, 2014

Thanks! Further study required on my part.

David Jondreau · March 5, 2014

I could have been less snarky, I apologize.

With UK postal codes there's a handful of correct combinations of letters and numbers. You could say a UK post code has 5-7 alphanumeric characters. You could validate on that. Or you can validate on the 6 different combinations of alpha and numeric that UK post codes allow. Those are just different layers of validation.

Luhn's calculation is simply another layer of validation that happens to apply to codes that use checksum digits.

In short, yes, there are algorithms that can check the validity of credit card numbers or Canadian Social Insurance numbers. There are rules, and those rules are published.

As you noted, validating a string is no guarantee is has been issued, you need to access the issuer's database to know that. Which is what the OP has, semi-effectively done. In the US at least, new postal codes are issued quarterly I believe. Or maybe just the boundaries of existing codes are changed. Either way, you've got to update that list pretty frequently.

*I prefer the term validation over formatting which for me implies non-relevant information such as using parens around a US area code in a phone number.

milanm · March 6, 2014

Interesting thoughts guys, glad i inspired something.

Comment, i have read the threads you suggested, one of them even before i posted and i am gladly using your code for validating fields where ever needed. The problem here was not to validate the possible combination of codes in UK but to find the best and fastest way to validate a postcode imported in the field against the database of EXISTING postcodes issued by Royal Mail quarterly.

Yes, this is the bugger with the way i resolved it, you have to put the file locally on each working station in order to make it work fast (nobody wants to wait more than few seconds just to validate the postcode), and than do it a few times a year.

I suppose i made it sound more complicated then what it really was, just a simple data management and validation problem, no maths involved.

Fitch · March 6, 2014

You would be able to use the file on the server if you used a different method. It sounds like you're using a field-level validation to test that the postcode is a member of a value list. However, even though the value list is built using indexed data, it still has to be built on the fly -- thus, the delay while it downloads all that data from the server.

Consider a a script trigger. For example, you could use a script trigger when you tab out of the field to put the postcode into a variable, go into find mode, go to a layout based on the postcode list (a blank layout, in form view -- not list), perform the find, and get the found count. This should be very fast on an indexed field, even with millions of records.

Note: If you had FileMaker 13 you'd have the new Perform Script On Server option. In FileMaker 12 you'd have ExecuteSQL. Neither necessarily faster than a good old indexed Find.

milanm · March 7, 2014

Thanks Fitch, this might save the day, looks like it could work well and save me from updating file on every work station every 3rd month.

Will test and hopefully implement it as the final solution if it works fast.

LaRetta · March 16, 2014

In more general terms, you are referring to a check digit. The Luhn formula is just one possible implementation of this feature. The UK postal code does not include a check digit, therefore any combination that conforms to the format COULD have been issued.

Not always, such as UK because it does not include a check digit. I did not know this either.

IOW, a code could match the pattern but not be valid, even such as for US, 90000 would be valid pattern but may not be a true valid USPS-assigned zip code.

What I mean, although Comment expressed it best in post #5 above, it is a matter of degree of importance ... excellent question. David said it as well better than I did.

If you are shipping to someone, you MUST be sure their zip is correct.

Added sentence

Edited March 16, 2014 by LaRetta

Sign In

Post Code Validation

Recommended Posts

milanm

milanm

comment

milanm

comment

milanm

comment

Rick Whitelaw

David Jondreau

Rick Whitelaw

comment

Rick Whitelaw

David Jondreau

milanm

Fitch

milanm

LaRetta

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information