Improving Data Accuracy (besides validation)

kennedy · November 12, 2002

Besides data validation (covered in my other thread), what other techniques do you all use to improve data accuracy?

Background

I just finished importing data from our prior six databases into my one new one in FMP. In doing so, I found hundreds of data errors. Some I fixed automatically. Others I simply detected and generated error reports so that we can go back and manually fix them.

Some of that error detection was done with checks that can be mapped to validations. But a lot of the rest was done by comparing duplicate or overlapping data for consistency. Now that I have everything in one database, the redundancy is gone (good), but the ability to double-check data is also gone (bad).

Thus, given 10-25% of records seemed to have a data entry error, how do I reduce that going forward??

kennedy · November 12, 2002

One option for improving data accuracy is auto-entering the likely correct info and allow the users to edit from there.

However, I've seen this result in more errors due to lazy or sloppy data entry. For example, we were auto-entering the zipcode that is about half our clients. However, I began to find that zipcode being used improperly for many addresses not in that city. So, we dumped the auto-entry of zipcode.

Experiences? Opinions? Recommendations?

kennedy · November 12, 2002

Popup lists are very good for casual users.

But if the bulk of the data is coming in from the power users who don't want to move their hands from the keyboard or have weird differences between fields... the popup list can actually get in the way and end up causing errors.

Thoughts on this?

LaRetta · November 12, 2002

When a User has large volumes of data-entry, the most accurate way of capturing the data, is to require blind double-entry. It sounds really bad in theory -- but if the data absolutely needs to be correct, i.e. paychecks are dependent upon it, it can work very well.

The User enters a batch of data, keeps the data in sequence. Then starts again. They are not allowed to see their previous entries. Each field is checked against their original entry and any *difference* is flagged for manual consideration. Sometimes it's a mis-interpretation of reading someone's writing. Blind double-entry is an old mainframe theory, but I swear still guarantees 100% (well 99.9%) accuracy. I've used it in various aspects for the past 20 years -- it actually works!

If a User enters the exact information twice it's usually right. I hope this helps -- I understand your situation all too well

LaRetta · November 12, 2002

I totally agree. Although I recently reviewed a publication that Vaughan suggested on keyboarding. The tests compared this very issue. Users swore it was faster to data enter w/o moving their hands from the keyboard but in reality it was quicker using checkboxes, etc.

Article: http://www.asktog.com/basics/03Performance.html...

My background still pulls me to full hands-on keyboarding but I'm trying to *change*. In some aspects, shortcut codes are helpful. For instance one field may need a location. There are only 4 locations. Instead of a pop-up, I have a location shortcut -- "O' for Office. It's quicker to type an O than select from pop-up. Shortcut codes allow the best of both worlds.

I also like to run an 'inconsistency analysis' over a record. It's just a set of logical choices and can point out *possible* problems. More like warnings than errors.

kennedy · November 13, 2002

Well, since none of the FMP gurus seem to have any other suggestions for improving data accuracy, , in this thread or the Data Validation thread, let me pursue the double blind entry suggestion...

Have any of you written a separate layout whose sole purpose is blind entry? If so, suggestions would be welcome.

Option A)

Add a RE-ENTER button to the layout.

Duplicate the layout. For each field in the layout, create a matching global. Change all the layout fields to use the globals. Change the RE-ENTER button into a CHECK button. Have the CHECK button compare each global to each field in the record and report what's different... probably asking which value to keep. The RE-ENTER button will simply bring up the new duplicate layout.

Option :

Avoid having to maintain two versions of the same layout by instead having RE-ENTER create a new record that is marked as being a re-entry of another record. Whenever you are looking at a re-entry record, the RE-ENTER button says "CHECK" instead. The script then compares each field in the record to the record its a re-entry of... similarly asking you which value you want to keep... and then deletes the re-entry record.

I am leaning towards B. Suggestions?

BobWeaver · November 14, 2002

Blind double entry is probably the most accurate way of manual entry of existing printed data that doesn't require any interpretation by the data entry person. However, if it's just someone on the phone taking someone's name and address and entering it in, this won't do much good. Also, for the double entry system you will need two different people to do the entry. If you use the same person, they tend to repeat the same errors. Even with double entry by two different people from a printed form, there will still be occasional errors (from my personal experience).

The question you have to ask is: "What are the consequences of errors in data entry?

Obviously it varies with each situation. Most critical are fields that will be used as keys in relationships or fields that are used in find requests (especially scripted ones that look for a very specific value). These should be value list entry with strict validation. On the other hand misspelling a customers name, while irritating to the customer, isn't likely to cause your database to quit working.

The biggest problem I encounter (and quite frequently too) is when a customer has acquired a contact list that they want to import into filemaker. The original list probably never had any kind of field validation. The fields likely don't correspond to fields in the Filemaker file (eg., a single name field vs. separate first, middle and last name fields). Many records will be duplicates of ones already in the database.

Fixing this is a one time process though. I generally search for duplicates of phone numbers to find the obvious duplicated records. Next, I search for duplicates of city names and omit them. What's left is a list of cities that are unique, and hence likely misspelled. These have to be manually checked. I do the same for street names and other things that are likely to recur in a large set of records. I always manually scan the data to see what other kind of errors commonly occur, and then I do finds and replaces on these records, or special repair scripts if necessary. Then, there's always the final manual check.

There are always things that slip through. Every big database system needs scheduled maintenance to fix things that get broken during normal daily use. If you start noting common recurring errors, then you can take steps to prevent them in the future.

The important thing is to design the system with data integrity in mind, even if you don't completely handcuff the user. As long as the system is built on sound principles, you should be able to adapt it when problems occur.

Sign In

Improving Data Accuracy (besides validation)

Recommended Posts

kennedy

kennedy

kennedy

LaRetta

LaRetta

kennedy

BobWeaver

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information