Saturday, January 7, 2012: 3:30 PM
Chicago Ballroom A (Chicago Marriott Downtown)
The GCP spent more time on error detection and data verification than on any other aspect of the project. That is not surprising, given the extent of data entry required for such a large project. I will walk the audience through the various stages of data verification with concrete, on-the-screen examples. I will provide a step-by-step review of the types of errors common to most database projects, including our own bad experiences. I will explore the various options available to reduce potential errors. I will explain the decisions we made in the following areas: 1) Customized data verification in software programs designed for specific projects vs. standard software programs such as the double entry “differencing” offered by Microsoft Excel. 2) Virtues and weaknesses of traditional oral/sight review of entered data. 3) Mechanics of “cleaning” the data (i.e., the search for data consistency). 4) The process of correcting errors. 5) Finally, the means by which a rate of project error can be determined (after all verification and correction procedures have been undergone). While some of the GCP errors are data specific (such as a table of marital status by sex that found 53 “male” doncellas ), our procedures are nearly universally applicable to historical database construction of various sizes and data types. Indeed, it is precisely the enormous amount of time that the GCP spent on error detection (and correction), necessitating as it did numerous procedures and generating many changes, that insures that smaller, individual projects will find something of use. Because data verification is the most technical of all procedures to be discussed in this session, we provide hard copies and electronic files of our procedures for the audience. I was on the GCP staff from 2002, and Assistant Director from 2006 to 2008.