Data Mining and Historical Content Analysis: Possibilities within the Brazilian Hemeroteca

Friday, January 6, 2017: 4:30 PM
Room 402 (Colorado Convention Center)
Ian Read, Soka University of America
In 2012, the National Library of Brazil began offering digital and online access to its microfilmed newspaper collection. Today, it appears to be largest of its kind in the world with more than 15 million pages of scanned material for about 700 periodical titles mostly from the nineteenth and twentieth centuries. Its content is keyword searchable, offering extraordinary new possibilities for historians of Brazil, slavery, the Atlantic World, and many transnational subjects. Previously, historians had to travel to the microfilm room of the National Library in Rio de Janeiro to access most of these periodicals, and they had no way to search these texts digitally. This new archive will create new expectations for historians and editors and peer-reviewers may expect its use for more research questions. With such exciting possibilities and daunting expectations in mind, this methodological paper focuses on data mining and historical content analysis within the context of the Brazilian digital Hemeroteca. It presents iMacros and Selenium, two programs compatible with internet browsers that may increase search capabilities and data processing. Additionally, this paper invites the audience to think creatively about systematic text analysis. By viewing these digitized newspaper collections as enormous textual databases that relate billions of words by textual proximity, temporality, purpose or political identity of newspapers, and geographic location, this paper demonstrates how new archives do not just answer old questions, but also create new modes of historical inquiry.

<< Previous Presentation | Next Presentation