Validating data entry

Chalkster · December 1, 2004

I have a couple of fields that must have data entered in a perticular format and would like to find a way of validating the data.

basicall the data must be entered as groups of 3 letters with a space separation between groups

eg LAX NYC MIA

There can be no spaces at the start of the field, there must be only one space between each group and the groups must consist of 3 letters, or a specific symbol (@ or / followed by two letters).

The number or groups entered in one field can be anything from 1 to 40.

I can subdivide the field into 40 data entry fields but since 95% of the time there are less than 5 groups entered it seems a waste of space .

Any way to write a calculation check and validate a full string or will I have to split the data into multiple fields?

-Queue- · December 1, 2004

I haven't tested this too much, but try

Let([ A = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; W = WordCount(groups)];

Case(

IsEmpty(groups); 1;

W > 40; 0;

PatternCount( groups; " " ) <> W - 1 or Length(groups) <> 3 * W + W - 1; 0;

Length(Filter( Upper(Middle( groups; 2; 2 )); A )) <> 2; 0;

W > 1; Case( Length(Filter( Upper(Middle( groups; 6; 2 )); A )) <> 2; 0;

W > 2; Case( Length(Filter( Upper(Middle( groups; 10; 2 )); A )) <> 2; 0;

W > 3; Case( Length(Filter( Upper(Middle( groups; 14; 2 )); A )) <> 2; 0;

W > 4; Case( Length(Filter( Upper(Middle( groups; 18; 2 )); A )) <> 2; 0;

W > 5; Case( Length(Filter( Upper(Middle( groups; 22; 2 )); A )) <> 2; 0;

W > 6; Case( Length(Filter( Upper(Middle( groups; 26; 2 )); A )) <> 2; 0;

W > 7; Case( Length(Filter( Upper(Middle( groups; 30; 2 )); A )) <> 2; 0;

W > 8; Case( Length(Filter( Upper(Middle( groups; 34; 2 )); A )) <> 2; 0;

W > 9; Case( Length(Filter( Upper(Middle( groups; 38; 2 )); A )) <> 2; 0;

W > 10; Case( Length(Filter( Upper(Middle( groups; 42; 2 )); A )) <> 2; 0;

W > 11; Case( Length(Filter( Upper(Middle( groups; 46; 2 )); A )) <> 2; 0;

W > 12; Case( Length(Filter( Upper(Middle( groups; 50; 2 )); A )) <> 2; 0;

W > 13; Case( Length(Filter( Upper(Middle( groups; 54; 2 )); A )) <> 2; 0;

W > 14; Case( Length(Filter( Upper(Middle( groups; 58; 2 )); A )) <> 2; 0;

W > 15; Case( Length(Filter( Upper(Middle( groups; 62; 2 )); A )) <> 2; 0;

W > 16; Case( Length(Filter( Upper(Middle( groups; 66; 2 )); A )) <> 2; 0;

W > 17; Case( Length(Filter( Upper(Middle( groups; 70; 2 )); A )) <> 2; 0;

W > 18; Case( Length(Filter( Upper(Middle( groups; 74; 2 )); A )) <> 2; 0;

W > 19; Case( Length(Filter( Upper(Middle( groups; 78; 2 )); A )) <> 2; 0;

W > 20; Case( Length(Filter( Upper(Middle( groups; 82; 2 )); A )) <> 2; 0;

W > 21; Case( Length(Filter( Upper(Middle( groups; 86; 2 )); A )) <> 2; 0;

W > 22; Case( Length(Filter( Upper(Middle( groups; 90; 2 )); A )) <> 2; 0;

W > 23; Case( Length(Filter( Upper(Middle( groups; 94; 2 )); A )) <> 2; 0;

W > 24; Case( Length(Filter( Upper(Middle( groups; 98; 2 )); A )) <> 2; 0;

W > 25; Case( Length(Filter( Upper(Middle( groups; 102; 2 )); A )) <> 2; 0;

W > 26; Case( Length(Filter( Upper(Middle( groups; 106; 2 )); A )) <> 2; 0;

W > 27; Case( Length(Filter( Upper(Middle( groups; 110; 2 )); A )) <> 2; 0;

W > 28; Case( Length(Filter( Upper(Middle( groups; 114; 2 )); A )) <> 2; 0;

W > 29; Case( Length(Filter( Upper(Middle( groups; 118; 2 )); A )) <> 2; 0;

W > 30; Case( Length(Filter( Upper(Middle( groups; 122; 2 )); A )) <> 2; 0;

W > 31; Case( Length(Filter( Upper(Middle( groups; 126; 2 )); A )) <> 2; 0;

W > 32; Case( Length(Filter( Upper(Middle( groups; 130; 2 )); A )) <> 2; 0;

W > 33; Case( Length(Filter( Upper(Middle( groups; 134; 2 )); A )) <> 2; 0;

W > 34; Case( Length(Filter( Upper(Middle( groups; 138; 2 )); A )) <> 2; 0;

W > 35; Case( Length(Filter( Upper(Middle( groups; 142; 2 )); A )) <> 2; 0;

W > 36; Case( Length(Filter( Upper(Middle( groups; 146; 2 )); A )) <> 2; 0;

W > 37; Case( Length(Filter( Upper(Middle( groups; 150; 2 )); A )) <> 2; 0;

W > 38; Case( Length(Filter( Upper(Middle( groups; 154; 2 )); A )) <> 2; 0;

W > 39; Case( Length(Filter( Upper(Middle( groups; 158; 2 )); A )) <> 2; 0; 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 );

1 )

)

TerenceM · December 2, 2004

-Queue- definitely deserves his "Carpal Tunnel" rating for that one.

Since you're running v7, you could also reformat the field to remove any extra spaces, and then validate to make sure all words have three letters.

For reformatting, set the field to have an Auto-Enter Calculation (uncheck "do not replace existing value") with the formula:

TrimAll ( groups ; 0 ; 0 )

This will remove all leading and trailing spaces, and reduce all spaces between words to a single space.

For the validation, you probably want to check for any invalid characters, and also check that all the words have three letters. So you can use the following:

Let ([ Spaces = Middle ( groups ; 4 ; 1 ) & Middle ( groups ; 8 ; 1 ) & Middle ( groups ; 12 ; 1 ) & Middle ( groups ; 16 ; 1 ) & Middle ( groups ; 20 ; 1 ) & Middle ( groups ; 24 ; 1 ) & Middle ( groups ; 28 ; 1 ) & Middle ( groups ; 32 ; 1 ) & Middle ( groups ; 36 ; 1 ) & Middle ( groups ; 40 ; 1 ) & Middle ( groups ; 44 ; 1 ) & Middle ( groups ; 48 ; 1 ) & Middle ( groups ; 52 ; 1 ) & Middle ( groups ; 56 ; 1 ) & Middle ( groups ; 60 ; 1 ) & Middle ( groups ; 64 ; 1 ) & Middle ( groups ; 68 ; 1 ) & Middle ( groups ; 72 ; 1 ) & Middle ( groups ; 76 ; 1 ) & Middle ( groups ; 80 ; 1 ) & Middle ( groups ; 84 ; 1 ) & Middle ( groups ; 88 ; 1 ) & Middle ( groups ; 92 ; 1 ) & Middle ( groups ; 96 ; 1 ) & Middle ( groups ; 100 ; 1 ) & Middle ( groups ; 104 ; 1 ) & Middle ( groups ; 108 ; 1 ) & Middle ( groups ; 112 ; 1 ) & Middle ( groups ; 116 ; 1 ) & Middle ( groups ; 120 ; 1 ) & Middle ( groups ; 124 ; 1 ) & Middle ( groups ; 128 ; 1 ) & Middle ( groups ; 132 ; 1 ) & Middle ( groups ; 136 ; 1 ) & Middle ( groups ; 140 ; 1 ) & Middle ( groups ; 144 ; 1 ) & Middle ( groups ; 148 ; 1 ) & Middle ( groups ; 152 ; 1 ) & Middle ( groups ; 156 ; 1 ) & Middle ( groups ; 160 ; 1 ) ];

Case (

// check for invalid characters

Upper ( groups ) <> Filter ( Upper ( groups ) ; "ABCDEFGHIJKLMNOPQRSTUVWXYZ@/ " ) ; 0 ;

// check for invalid word lengths

Spaces <> Filter ( Spaces ; " " ) ; 0 ;

Length ( RightWords ( groups ; 1 ) ) <> 3 ; 0 ;

WordCount ( groups ) > 40 ; 0 ;

1 )

)

This should accomplish the same thing, but execute a bit faster than Queue's approach. Since most groups are 5 or fewer words, you could subdivide the Spaces calculation into the first five, and then the rest if there are more words, but it doesn't seem to take very long to run all 40 Middle statements. It would be more elegant with a recursive custom function, but I doubt it would be any faster.

-Terence

TerenceM · December 2, 2004

Oops. You'll also need to add a test to make sure the number of words agrees with the total length, to catch examples like "AAA A A AAA". Add this to the Case statement:

WordCount ( groups ) * 4 - 1 <> Length ( groups ) ; 0 ;

-Terence

-Queue- · December 2, 2004

Thanks for the TrimAll suggestion, Terence. I hadn't used that one before.

Are you certain that using the massive Let will be faster than the Case statement though? Since version 7 short-circuits, it would seem to me that if there were five or fewer groups, using this feature would be considerably faster than combining the 40 groups of Middle characters. If there is only one group, the calculation will short-circuit after the W > 1 test and return 1. If there are two groups, it will short-circuit after W > 2 and return 1, etc. Plus, it short-circuits and returns zero if the field is empty, if there are more than 40 groups, or if there are an incorrect number of spaces or total characters in the groups before it ever tests the last two characters of each group.

True, it could be streamlined; but I tend to think the Case would still be faster and more efficient for this situation. What are your thoughts?

TerenceM · December 3, 2004

Yeah, I was deceived at first by the description of TrimAll, which makes it sound likes it primarily useful for dealing with reformatting text around non-roman characters (which isn't high on my list). But its good for much more (in this case, reducing all spaces to single-space, or it can also be used to remove all spaces).

The question of speed got me to tinkering and comparing the different approaches. Here are times for 4 different validation-style calculations:

"Cases" is Queue's nested case function, taken verbatim

"Middles" is my version with 40xMiddles, taken verbatim from above

"ModMid" is similar to Middles, but splits the Middles into two calculations of the first 10, and the remaining 30 if needed

"Recursive" is using a FP7 recursive function

Each calculation was run 1000 times on text fields with different numbers of groups (Len). All the text should generate a "valid" result, requiring the validation to run through the entire sequence. Here are the results:

Len Cases Middles ModMid Recursive

40 0:00:33 0:00:26 0:00:27 0:00:30

20 0:00:17 0:00:21 0:00:21 0:00:15

10 0:00:10 0:00:19 0:00:09 0:00:08

5 0:00:07 0:00:17 0:00:08 0:00:05

3 0:00:06 0:00:17 0:00:07 0:00:04

2 0:00:06 0:00:15 0:00:07 0:00:03

1 0:00:04 0:00:16 0:00:07 0:00:03

(all run on a 500 MHz G3 iBook, so all the times are really negligible for the purpose of validating a single field)

I was a bit surprised by the results. Running through the 40 nested Case statements takes marginally longer than calculating the 40 Middle statements, and that disadvantage is quickly negated when analyzing fewer words/groups (as Queue predicted). Even the ModMiddle approach, which only calculates 10 Middles for the examples with 10 or fewer words/groups, is still a bit slower than the nested Cases. So I'm thinking there may be a hidden speed hit when trying to evaluate a Middle statement beyond the end of the string, since the "Middles" calculation avoids the length/filter/upper/middle calculation on every word which should have given it an advantage.

Note that Queue's formula would also run a bit faster by pre-processing the text with TrimAll, and then just using the nested Case statements to check if each 4th position is a space (combined with a test to make sure the number of spaces = number of words - 1 ).

Its also nice to know that a simple recursive custom function outpaces all the other approaches. They really should have included the ability to write custom functions in the standard version of FP7 -- they're useful for more than just developers.

-Terence

-Queue- · December 3, 2004

Very interesting results, Terence. Thank you for putting your time into testing this.

It's kind of odd that the Custom Function inefficiency nearly doubles each time the length doubles, yet it's still faster than the others until it reaches the largest group.

I just noticed, though, that your function doesn't appear to test whether the 2nd and 3rd character of each group is alphabetic, one of Chalkster's requirements. You could have an @ or / as the 2rd or 3rd character in each group and it would be valid.

It's nice to see evidence that version 7 does indeed short-circuit and that a little forethought can reduce calculation time quite noticeably.

TerenceM · December 3, 2004

Your right -- I neglected to deal with checking the 2nd or 3rd positions for @ or / characters. I know / will register as a word delimiter, so groups of the format A/A will be caught, but I'm not sure what happens with //A or AA/, or if @ will also serve as a word delimiter (I imagine so, but I have Filemaker parsing a 150 MB file in the background right now so I can't check).

I'm still puzzled as to why the nested Case approach isn't substantially hit by all the nested Length(Filter(Upper(Middle statements. That is, when evaluating a string of 40 groups, the Case approach only takes 27% longer than the Middles approach. Each version has to evaluate the following:

Case:

40 Middles

40 Uppers (2 characters each)

40 Filters (2 characters each)

41 Lengths

1 PatternCount (160 characters)

1 IsEmpty

44 logicals

Middles:

40 Middles

2 Uppers (160 characters each, 1 could be eliminated in the Let statement)

2 Filters (one of 160 characters, the other of 40 characters)

2 Lengths

2 WordCounts (1 could be eliminated in the Let statement)

5 Logicals

Evaluating and storing the 40 Middles in the Let statement takes a majority of the execution time (15 seconds). So it may simply be that Logical/Length/Filter/Upper are all much faster than Middle and don't have as big a hit. Also, evaluating 40 Filter/Uppers on 2 characters each may not be much worse than executing a single Filter/Upper on the entire string. I'm also wondering if storing data in the Let statement has a significant performance penalty -- I'll probably play around with these questions more over the weekend.

Also, my guess about efficiency of the Middle statement was wrong -- evaluating Middle statements is no more or less efficient if the data exists (e.g. Middle ( groups ; 4 ; 1 ) and Middle ( groups ; 160 ; 1 ) take the same amount of time to execute on an 11 character string).

Hmmm. I wonder if we've scared Chalkster off yet -- I imagine this is becoming a bit more than the expected response.

-Terence

-Queue- · December 3, 2004

To satisfy your curiousity, any combination with @ or / in the last two characters of a group passes the validation with your calc.

And I just noticed that mine didn't even test the first character of each group for letters, @, or /. I mistakenly read it to mean ANY character followed by two letters.

Looks like both of our ideas need an overhaul.

Chalkster · December 5, 2004

Well guys ... you certainly havent scared me off.... but I must admit I am tryong to digest a lot of what you've written.... I thought I was doing ok with FM until I sawthe replies.... guess I have a hellof a lot to learn!

While grateful for the formulas(extremely grateful) could one of you give me a brief rundown of how it is working.... I got kinda lost along the way...

And thanks again for the work you put in here, much appreciated!

Oldfogey · December 6, 2004

The timing might seem odd but don't forget that a lot depends on how the various functions are implemented. A coder has a bad hair day and we pay for it for years.

Recursiveness usually gets very inefficient with lots of recursions unless one hell of a lot of work has gone into its implementation, specifically aimed at recursion. (Unlikely for FMD, I suspect.)

Queue and Murph, you deserve medals!

-Queue- · December 6, 2004

Okay, here's a modified version of my validation calculation that includes the test for the first character of each group. I didn't use TrimAll, because it seems that you don't want the user to be able to enter additional spaces and therefore they shouldn't be ignored.

Case(

IsEmpty(groups); 1;

Let( W = WordCount(groups);

Case( W > 40; 0; Length(groups) <> 4 * W - 1; 0; PatternCount( groups; " " ) <> W - 1; 0;

Position( groups; "#"; 0; 1 ); 0;

Let([ A = "ABCDEFGHIJKLMNOPQRSTUVWXYZ# "; F = Filter( Upper(Substitute( groups; ["@"; "#"]; ["/"; "#"] )); A )];

Case(

PatternCount( Middle( F; 2; 2 ); "#" ); 0;

W > 1; Case( PatternCount( Middle( F; 6; 2 ); "#" ) or Middle( F; 4; 1) <> " "; 0;

W > 2; Case( PatternCount( Middle( F; 10; 2 ); "#" ) or Middle( F; 8; 1) <> " "; 0;

W > 3; Case( PatternCount( Middle( F; 14; 2 ); "#" ) or Middle( F; 12; 1) <> " "; 0;

W > 4; Case( PatternCount( Middle( F; 18; 2 ); "#" ) or Middle( F; 16; 1) <> " "; 0;

W > 5; Case( PatternCount( Middle( F; 22; 2 ); "#" ) or Middle( F; 20; 1) <> " "; 0;

W > 6; Case( PatternCount( Middle( F; 26; 2 ); "#" ) or Middle( F; 24; 1) <> " "; 0;

W > 7; Case( PatternCount( Middle( F; 30; 2 ); "#" ) or Middle( F; 28; 1) <> " "; 0;

W > 8; Case( PatternCount( Middle( F; 34; 2 ); "#" ) or Middle( F; 32; 1) <> " "; 0;

W > 9; Case( PatternCount( Middle( F; 38; 2 ); "#" ) or Middle( F; 36; 1) <> " "; 0;

W > 10; Case( PatternCount( Middle( F; 42; 2 ); "#" ) or Middle( F; 40; 1) <> " "; 0;

W > 11; Case( PatternCount( Middle( F; 46; 2 ); "#" ) or Middle( F; 44; 1) <> " "; 0;

W > 12; Case( PatternCount( Middle( F; 50; 2 ); "#" ) or Middle( F; 48; 1) <> " "; 0;

W > 13; Case( PatternCount( Middle( F; 54; 2 ); "#" ) or Middle( F; 52; 1) <> " "; 0;

W > 14; Case( PatternCount( Middle( F; 58; 2 ); "#" ) or Middle( F; 56; 1) <> " "; 0;

W > 15; Case( PatternCount( Middle( F; 62; 2 ); "#" ) or Middle( F; 60; 1) <> " "; 0;

W > 16; Case( PatternCount( Middle( F; 66; 2 ); "#" ) or Middle( F; 64; 1) <> " "; 0;

W > 17; Case( PatternCount( Middle( F; 70; 2 ); "#" ) or Middle( F; 68; 1) <> " "; 0;

W > 18; Case( PatternCount( Middle( F; 74; 2 ); "#" ) or Middle( F; 72; 1) <> " "; 0;

W > 19; Case( PatternCount( Middle( F; 78; 2 ); "#" ) or Middle( F; 76; 1) <> " "; 0;

W > 20; Case( PatternCount( Middle( F; 82; 2 ); "#" ) or Middle( F; 80; 1) <> " "; 0;

W > 21; Case( PatternCount( Middle( F; 86; 2 ); "#" ) or Middle( F; 84; 1) <> " "; 0;

W > 22; Case( PatternCount( Middle( F; 90; 2 ); "#" ) or Middle( F; 88; 1) <> " "; 0;

W > 23; Case( PatternCount( Middle( F; 94; 2 ); "#" ) or Middle( F; 92; 1) <> " "; 0;

W > 24; Case( PatternCount( Middle( F; 98; 2 ); "#" ) or Middle( F; 96; 1) <> " "; 0;

W > 25; Case( PatternCount( Middle( F; 102; 2 ); "#" ) or Middle( F; 100; 1) <> " "; 0;

W > 26; Case( PatternCount( Middle( F; 106; 2 ); "#" ) or Middle( F; 104; 1) <> " "; 0;

W > 27; Case( PatternCount( Middle( F; 110; 2 ); "#" ) or Middle( F; 108; 1) <> " "; 0;

W > 28; Case( PatternCount( Middle( F; 114; 2 ); "#" ) or Middle( F; 112; 1) <> " "; 0;

W > 29; Case( PatternCount( Middle( F; 118; 2 ); "#" ) or Middle( F; 116; 1) <> " "; 0;

W > 30; Case( PatternCount( Middle( F; 122; 2 ); "#" ) or Middle( F; 120; 1) <> " "; 0;

W > 31; Case( PatternCount( Middle( F; 126; 2 ); "#" ) or Middle( F; 124; 1) <> " "; 0;

W > 32; Case( PatternCount( Middle( F; 130; 2 ); "#" ) or Middle( F; 128; 1) <> " "; 0;

W > 33; Case( PatternCount( Middle( F; 134; 2 ); "#" ) or Middle( F; 132; 1) <> " "; 0;

W > 34; Case( PatternCount( Middle( F; 138; 2 ); "#" ) or Middle( F; 136; 1) <> " "; 0;

W > 35; Case( PatternCount( Middle( F; 142; 2 ); "#" ) or Middle( F; 140; 1) <> " "; 0;

W > 36; Case( PatternCount( Middle( F; 146; 2 ); "#" ) or Middle( F; 144; 1) <> " "; 0;

W > 37; Case( PatternCount( Middle( F; 150; 2 ); "#" ) or Middle( F; 148; 1) <> " "; 0;

W > 38; Case( PatternCount( Middle( F; 154; 2 ); "#" ) or Middle( F; 152; 1) <> " "; 0;

W > 39; Case( PatternCount( Middle( F; 158; 2 ); "#" ) or Middle( F; 156; 1) <> " "; 0; 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ); 1 ) ) ) ) )

And here's an explanation of what it does.

1. Tests if the field is empty and returns a 1 (valid) if true.

2. Tests if the number of groups (WordCount) is greater than 40 and returns a zero (invalid) if true.

3. Tests if the number of characters (Length) is not equal to four times the WordCount, minus 1, and returns a zero if true. Each space should be located at the end of a group, in the 4 * groupnumber position, i.e. the first space in the 4th position (4 * 1), the second space in the 8th position (4 * 2), etc. So the total length of the field is 4 * n - 1, because there should be no space after the nth group.

4. Tests if the number of spaces in the field is not equal to WordCount minus one and returns a zero if true. Two groups should have one space between them. Three groups should have two spaces between them. So, n groups should have n - 1 spaces between them.

5. Tests if the field contains a # and returns zero if true. The @ and / characters will next be converted to #. So we need to ensure that there is no # in the field already.

6. Substitutes # for all instances of @ or /, capitalizes the result, and filters only capital letters, #s, and spaces from the result.

7. Tests if there is a # in the second or third character and returns a zero if true.

8. Tests if there is a # in the sixth or seventh character or if the fourth character is not a space and returns zero if either is true.

Step 8 is repeated for all other second and third group characters and space between the current group and the previous.

If all of the tests pass up to W > n but W is not > n, then a 1 is returned and the field is valid. This is the reason for the succession of ones after the final Case.

Chalkster · December 22, 2004

Now I can follow this ... and it works well... thanks

But to make life more complicated there is one other value that can be entered in the field "???" which if used can only be used by itself without any other groups. So the above works great, I need to add the possibility of a single alternative entry.

I guess I will need to use an "or " somewhere in here but is it before the case or within it?

And thanks to all that originally answered.... I am still somewhat baffled by some points of the discussion of various methods but I have definitely learned something!

-Queue- · December 22, 2004

After

Case(

IsEmpty(groups); 1;

add

groups = "???"; 1;

this will be followed by the

Let( W = WordCount(groups); line.

Chalkster · December 22, 2004

Perfect... thanks !

Sign In

Validating data entry

Recommended Posts

Chalkster

-Queue-

TerenceM

TerenceM

-Queue-

TerenceM

-Queue-

TerenceM

-Queue-

Chalkster

Oldfogey

-Queue-

Chalkster

-Queue-

Chalkster

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information