Saturday, January 7, 2012: 3:10 PM
Chicago Ballroom A (Chicago Marriott Downtown)
The Guadalajara Census Project entered over thirteen million pieces of data in 92 variables for 145,449 individuals, 29,009 families and 26,154 families. The point here is not to glorify the quantity but to emphasize the depth of our experience in coding and data entry procedures. It is here, on the “shop-floor,” that the GCP gained (“earned” is a better word!) invaluable practical experience in what works and what does not. I was the coding and data entry supervisor for the GCP from 2003 through 2006. I will discuss the lesson learned in the following categories: 1) The strategies of data coding and data entry. 2) Practical procedures for encoding data and for data entry. 3) Unnecessary data captured and necessary data lost. 4) Personnel qualities and deficiencies. 5) Measuring and enforcing coding and data entry productivity. While our initial data entry software experience was confined to Excel (a disaster) and, thankfully more successfully, SPSS, I will also address how other data entry software (SAS, Stata, Microsoft Access) might deal more effectively (or not!) with the above issues. While most historical database projects will not be as huge, or as complicated, as the GCP, the lessons learned are still appropriate, and perhaps even more important to small, individually-run projects for whom inefficiencies, costly mistakes, and software “surprises” would be even less welcomed than in our multi-phase, adequately funded, and sufficiently staffed project. Data coding and data entry are the workhorses of all database projects, and should not be overlooked as somehow too prosaic to bother with, or too obvious to need training or careful thought. The GCP experience should make it unnecessary for any beginners at database construction to have to “reinvent the wheel.”