Marking Text for Digital History: From Microhistory to Big Data with the Text Encoding Initiative (TEI)
The rise of digitization has brought historians fresh awareness of the physical and textual dimensions of our analog primary sources; it has also fostered new attention to questions of scale. The acts of photographing, sorting, annotating, aggregating, and uploading content for digital presentation draw attention to the contours of the original and to its place within the larger archival collection. As scholars add layers of information, digitization has the effect of turning all content, whether qualitative or quantitative, into data. Big data, structured data, granular data, metadata: contemporary information systems challenge the digital historian to understand anew how evidence creates meaning.
In The History Manifesto (Cambridge, 2014) Jo Guldi and David Armitage advocate a return to the longue durée seasoned by the insights of micro-history. “When we overlook the details,” they suggest, “questions about the big picture may slip away – no longer answered by data, but answered by speculation with the data used as marginalia” (57). Yet for Jerome McGann, writing from a literary perspective, current digital tools remain unable to capture fully the multiplicity of meanings embedded in all texts (A New Republic of Letters; Harvard, 2014). They have yet to achieve the capacity for self-reflexivity of their traditional counterparts, he argues, “which is to say, for transforming storage into memory, and data into knowledge” (96). If Guldi and Armitage offer historians a reason to build layered insights, McGann cautions that we do not yet have a sufficiently rich digital method to do so.
With these twin inspirations, this session interrogates the origins and presentations of historical data with specific attention to the theory and practice of textual markup using the Guidelines of the Text Encoding Initiative (TEI). A rigorous, robust, plain text, and archivally sound intellectual framework grounded in twenty years of interdisciplinary development, TEI enables scholars to enrich digital versions of texts with layers of interpretation that allow them to be searched, sorted, and displayed according to supplementary editorial or analytic insights. The session’s four papers explore the use of TEI to create data from archival sources at different scales. The first demonstrates how use of TEI can capture a book’s multiple publication stages and disrupt notions of authorial intention. The second displays ways that markup enables layering of interpretation onto a manuscript diary. The third proposes an extension of TEI into markup for account books to facilitate comparative analysis across time and space. And the fourth shows application of TEI and associated tools in digitizing a large corpus of analog texts, the print and manuscript collections of the U.S. Department of State.
“All text is marked text,” writes McGann (90), challenging our digital markup methods to rise to the fully layered meanings of analog sources. These papers take up McGann’s challenge while contributing to the creation of scaled historical data that may help meet the ambitious goals of Guldi and Armitage.