Making Big Data: Historical Financial Records

Saturday, January 4, 2014
Exhibit Hall B South (Marriott Wardman Park)
Kathryn Tomasek, Wheaton College (Massachusetts)
In the current iteration of digital history, much is made of the research potential of big data, but most of the information embedded in historical financial records (HFRs) remains trapped in traditional analog archives. Documentary projects have long tended to exclude HFRs, in part because the labor involved in the transcription and markup, let alone formatting, of such abundant and detailed records has seemed to outweigh their value. Thus while individual scholars might produce datasets for their own research, the raw information on which their narratives are based tends to remain either inaccessible or—even if published online—less open to interchange with other datasets than shared standards would allow.

Large documentary projects in the United States and Europe have begun to turn their attention to methods for transcription and markup of HFRs. The relational database developed by editors of the Papers of George Washington at the University of Virginia offers a model for capturing tabular data as it appears on the manuscript page. In Ireland, editors of the Alcalá Account Book Project have produced a digital edition of more than sixty folios (over 300 pages) of eighteenth-century HFRs from the Royal Irish College of Saint George the Martyr in Alcalá, Salamanca.

Internationally, the de facto standard for encoding digital editions of print and manuscript materials is markup using the eXtensible Markup Language (XML) in accordance with the Text Encoding Initiative Guidelines (TEI). For example, the American Founding Era Collection produced by Rotunda, the digital imprint of the University Press of Virginia, uses TEI.

Markup of HFRs with TEI-compatible XML could present a model for creating harvestable data across multiple projects. This poster offers some examples of suggested markup for simple HFRs as well as a model for stand-off markup of double-entry accounts. The stand-off markup represents an effort to reproduce the information embedded in double entry accounts to model transactions as a sequence of one or more transfers of anything of value from one account to another.

TEI-compatible markup allows capture of semantic values within manuscript HFRs, making possible more nuanced representations than those produced by econometric interpretations of the past. Thus standardized digitization of HFRs, a relatively inaccessible genre of texts, has the potential to produce harvestable data that could open new lines of inquiry for economic, social, and cultural history.

Note: Research supported in part by a Start-Up Grant from the Office of Digital Humanities at the National Endowment for the Humanities. Any views, findings, conclusions, or recommendations herein do not necessarily reflect those of the National Endowment for the Humanities. Versions of this poster have been presented at the TEI Members Meeting in 2012, accepted for the Digital Humanities conference in 2013, and proposed for the annual meeting of the Association for Documentary Editing in 2013. We seek to present at the AHA in an effort to reach the broadest possible audience of historians within the United States as we continue to contribute to building the international community of practice interested in digital representations of HFRs.

See more of: Poster Session
See more of: AHA Sessions
<< Previous Presentation | Next Presentation