“Many Facts in Small Compass”: Information Literature in C19 Newspapers (MLA15 Talk)

slide 1

Ryan Cordell, Northeastern University

MLA 2015 | Vancouver, BC

Download talk slides.

slide 2

My remarks today will be drawn from my work on the Viral Texts project at Northeastern University. In brief, I’m working with a colleague in computer science to automatically identify the most frequently-reprinted texts in digitized archives of nineteenth-century newspapers. We have thus far drawn from the Library of Congress’ Chronicling America collection, but are currently expanding the corpora from which we are drawing to include magazines, as well as a broader selection of American and transatlantic newspapers. We have identified nearly half a million reprinted texts from the LoC’s nineteenth-century holdings. The majority of these were reprinted only a few times, but a significant minority were reprinted in 50, 100, or even 200 newspapers from this one archive.

We went into this project in search of the literature, such as newspaper poetry, that flourished in a print culture founded on textual sharing and through a deeply hybrid and intertextual medium. In the broadest sense, I hoped to expand our ideas of which writers resonated with nineteenth-century readers and create new bibliographies of popular but critically-overlooked literature.

slide 3

On this front the project has been promising. For every reprinted Longfellow poem we find many more by authors such as Elizabeth Akers Allen, Isabella Banks, Charles Monroe Dickinson, Colonel Theodore O’Hara, Emily Rebecca Page, Nancy Priest Wakefield, or John Whitaker Watson—or, perhaps even more likely, by an anonymous author. Such poems circulated within a system of exchanges and selection—newspaper editors cut, pasted, and recomposed content from their exchange partners and sent their papers to be similarly aggregated elsewhere.

But recognizably literary genres have been only a small part of the project. One of the most dramatic outcomes of this work thus far has been to highlight the importance of understudied genres of everyday reading and writing within the ecology of nineteenth-century print culture. These species of writing include political news, travel accounts, squibs, scientific reports, inspirational or religious exhortations, temperance narratives, vignettes, self-help guides, trivia, recipes, and even, to borrow a modern Internet term, listicles, all of which juxtaposed with poems, stories, and news on the page of the nineteenth-century paper. As a general (and perhaps unsurprising rule), the most frequently-reprinted pieces are concise, quotable, and widely relatable texts that would have been easy to recontextualize for different newspapers and new audiences—and that could easily fit gaps in the physical newspaper pages, as editors and compositors needed.

slide 4

My remarks today focus on those genres we might categorize as “information literature”: lists, tables, recipes, scientific reports, trivia columns, and so forth. I want to separate these from news itself, which is certainly a kind of information genre, but which I would mark as stylistically and operationally distinct from the other genres I’ve listed. Here’s one example of information literature, a list of supposed “facts,” primarily about human lives and demographics, which was published under many names in at least 120 different newspapers between 1853 and 1899 (which is approximately one quarter of the nineteenth-century newspapers in Chronicling America). Continue reading

On Ignoring Encoding

Lately we’ve seen a spate of articles castigating the digital humanities—perhaps most prominently, Adam Kirsch’s piece in New Republic, “Technology Is Taking Over English Departments: The False Promise of the Digital Humanities.” I don’t plan in this post to take on the genre or refute the criticisms of these pieces one by one; Ted Underwood and Glen Worthy have already made better global points than I could muster. My biggest complaint about the Kirsch piece—and the larger genre it exemplifies—would echo what many others have said: these pieces purport to critique a wide field in which their authors seem to have done very little reading. Also, as Roopika Risam notes, many of these pieces conflate “digital humanities” with the DH that happens in literary studies, leaving digital history, archeology, classics, art history, religious studies, and the many other fields that contribute to DH out of the narrative. In this way these critiques echo conversations happening with the DH community about its diverse genealogies, such as Tom Scheinfeldt’s The Dividends of Difference, Adeline Koh’s Niceness, Building, and Opening the Genealogy of the Digital Humanities, or Fiona M. Barnett’s “The Brave Side of Digital Humanities.”

Even taken as critiques of only digital literary studies, however, pieces such as Kirsch’s problematically conflate “big data” or “distant reading” with “the digital humanities,” seeing large-scale or corpus-level analysis as the primary activity of the field rather than one activity of the field, and explicitly excluding DH’s traditions of encoding, archive building, and digital publication. I have worked and continue to work in both these DH traditions, and have been struck by how reliably one is recongized—to be denounced—while the other is ignored or disregarded. The formula for denouncing DH seems at this point well established, though the precise order of its elements sometimes shifts from piece to piece:

  1. Juxtapose Aiden and Michel’s “culturomics” claims with the stark limitations of the Ngrams viewer.
  2. Cite Stephen Ramsay’s “Who’s in and Who’s Out,” specifically the line “Do you have to know how to code? I’m a tenured professor of digital humanities and I say ‘yes.'” Bemoan the implications of this statement.
  3. Discuss Franco Moretti on “distant reading.” Admit that Moretti is the most compelling of the DH writers, but remain dissatisfied with the prospects for distant reading.

These critiques are worth airing, though they’re not particularly surprising—if only because the DH community has been debating these ideas in books, blog posts, and journal articles for a long while now. Matt Jockers’ Macroanalysis alone could serve as a useful introduction to the contours of this debate within the field.

More problematically, however, by focusing on Ramsay and Moretti, these pieces ignore the field-constitutive work of scholars such as Julia Flanders, Bethany Nowviskie, and Susan Schreibman. This vision of DH is all Graphs, Maps, Trees and no Women Writers Project. All coding and no encoding.

Continue reading

Mr. Penumbra, Distant Reading, and Cheating at Scholarship

My Technologies of Text course is capping this semester reading Robin Sloan’s novel, Mr. Penumbra’s 24-Hour Bookstore, which Matt Kirschenbaum deemed “the first novel of the digital humanities” last year. Mr. Penumbra is a fine capstone because it thinks through so many of our course themes: the (a)materiality of reading, the book (and database) as physical objects, the relationship between computers and previous generations of information technology, &c. &c. &c. I will try not too spoil much of the book here, but I will of necessity give away some details from the end of the first chapter. So if you’ve not yet read it: go thou and do so.

Rereading the book for class, I was struck by one exchange between the titular Mr. Penumbra—bookstore owner and leader of a group of very close readers—and the narrator, Clay Jannon—a new bookstore employee curious about the odd books the store’s odd club members check out. In an attempt to understand what the club members are up to, Clay scans one of the store’s logbooks, which records the comings and goings of club members, the titles of the books they checked out, and when they borrowed each one. When he visualizes these exchanges over time within a 3d model of the bookstore itself, visual patterns of borrowing emerge, which seem, when compiled, to reveal an image of a man’s face. When Clay shows this visualization to Mr. Penumbra, they have an interesting exchange that ultimately hinges on methodology: Continue reading

Omeka/Neatline Workshop Agenda and Links

We’ll be working with the NULab’s Omeka Test Site for this workshop. You should have received login instructions before the workshop. If not, let us know so we can add you.

Workshop Agenda

9:00-9:15 Coffee, breakfast, introductions
9:15-9:45 Omeka project considerations

9:45-10:30 The basics of adding items, collections, and exhibits
10:30-10:45 Break!
10:45-11:15 Group practice adding items, collections, and exhibits
11:15-12:00 Questions, concerns
12:00-1:30 LUNCH!
1:30-2:15 Georectifying historical maps with WorldMap Warp
2:15-3:00 The basics of Neatline
3:00-3:15 Break!
3:15-3:45 Group practice creating Neatline exhibits
3:45-4:00 Final questions, concerns
4:00-5:00 Unstructured work time

Sample Item Resources

Historical Map Resources

Omeka Tutorial

Neatline Tutorials

Model Neatline Exhibits

Representing the “Known Unknowns” in Humanities Visualizations

Note: If this topic interests, you should read Lauren Klein‘s recent article in American Literature, “The Image of Absence: Archival Silence, Data Visualization, and James Hemings,” which does far more justice to the topic than I do in my scant paragraphs here.

Pretty much every time I present the Viral Texts Project, the following exchange plays out. During my talk I will have said something like, “Using these methods we have uncovered more than 40,000 reprinted texts from the Library of Congress’ Chronicling America collection, many hundreds of which were widely reprinted—and most of which have not been discussed by scholars.” During the Q&A following the talk, a scholar will inevitably ask, “you realize you’re missing lots of newspapers (and/or lots of the texts that were reprinted), right?”

To which my first instinct is exasperation. Of course we’re missing lots of newspapers. The majority of C19 newspapers aren’t preserved anywhere, and the majority of archived newspapers aren’t digitized. But the ability to identify patterns across large sets of newspapers is, frankly, transformative. The newspapers that have been digitized under the Chronicling America banner are actually the product of many state-level digitization efforts, which means we’re able to study patterns across collections that were housed in many separate physical archives, providing a level of textual address not impossible, but very difficult in the physical archive. So my flip answer—which I never quite give—is “yes, we’re missing a lot. But 40,000 new texts is pretty great.”

But those questions do nag at me. In particular I’ve been thinking about how we might represent the “known unknowns” of our work,1 particularly in visualizations. I really started picking at this problem after discussing the Viral Texts work with a group of librarians. I was showing them this map,

which transposes a network graph of our data onto a map which merges census data from 1840 with the Newberry Library’s Atlas of Historical County Boundaries. One of the librarians was from New Hampshire, and she told me she was initially dismayed that there were no influential newspapers from New Hampshire, until she realized that our data doesn’t include any newspapers from New Hampshire, because that state has not yet contributed to Chronicling America. She suggested our maps would be vastly improved if we somehow indicated such gaps visually, rather than simply talking about them.

In the weeks since then, I’ve been experimenting with how to visualize those absences without overwhelming a map with symbology. The simplest solution, as almost always, appears to be the best.

In this map I’ve visualized the 50 reprintings we have identified of one text, a religious reflection by Nashville editor George D. Prentice, often titled “Eloquent Extract,” between the years 1836-1860. The county boundaries are historical, drawn from the Newberry Atlas, but I’ve overlain modern state boundaries with shading to indicate whether we have significant, scant, or no open-access historical newspaper data from those states. This is still a blunt instrument. Entire states are shaded, even when our coverage is geographically concentrated. For New York, for instance, we have data from a few NYC newspapers and magazines, but nothing yet from the north or west of the state.

Nevertheless, I’m happy with these maps as helping me begin to think through how I can represent the absences of the digital archives from which our project draws. And indeed, I’ve begun thinking about how such maps might help us agitate—in admittedly small ways—for increased digitization and data-level access for humanities projects.

This map, for instance, visualizes the 130 reprints of that same “Eloquent Extract” which we were able to identify searching across Chronicling America and a range of commercial periodicals archives (and huge thanks to project RA Peter Roby for keyword searching many archives in search of such examples). For me this map is both exciting and dispiriting, pointing to what could be possible for large-scale text mining projects while simultaneously emphasizing just how much we are missing when forced to work only with openly-available data. If we had access to a larger digitized cultural record we could do so much more. A part of me hopes that if scholars, librarians, and others see such maps they will advocate for increased access to historical materials in open collections. As I said in my talk at the recent C19 conference:

While the dream of archival completeness will always and forever elude us—and please do not mistake the digital for “the complete,” which it never has been and never will be—this map is to my mind nonetheless sad. Whether you consider yourself a “digital humanist” or not, and whether you ever plan to leverage the computational potential of historical databases, I would argue that the contours and content of our online archive should be important to you. Scholars self-consciously working in “digital humanities” and also those working in literature, history, and related fields should make themselves heard in conversations about what will become our digital, scholarly commons. The worst possible thing today would be for us to believe this problem is solved or beyond our influence.

In the meantime, though, we’re starting conversations with commercial archive providers to see if they would be willing to let us use their raw text data. I hope maps like this can help us demonstrate the value of such access, but we shall see how those conversations unfold.

I will continue thinking about how to better represent absence as the geospatial aspects of our project develop in the coming months. Indeed, the same questions arise in our network visualizations. Working with historical data means that we have far more missing nodes than many network scientists working, for instance, with modern social media data. Finding a way to represent missingness—the “known unknowns” of our work—seems like an essential humanities contribution to geospatial and network methodologies.

1. Yes, I’m borrowing a term from Donald Rumsfeld here, which seems like a useful term for thinking about archival gaps, while perhaps not such a useful term for thinking about starting a war. We can blame this on me watching an interview with Errol Morris about The Unknown Known on The Daily Show last night.

Boston DH Consortium Session #3 Breakout Group Notes

For breakout groups in the “Out-of-the-Box” DH Tools session at the Boston-Area DH Consortium Faculty Retreat (Fall 2013):


Oxygen/TEI BP





Creating a Historical Map with GIS

In the next few days I’ll be teaching a few workshops centered largely on teaching participants to georeference historical maps using ArcGIS. I’ll do this first at the Northeastern English Graduate Student Association’s 2013 Conference, /alt, and then at the Boston-Area Days of DH conference we’re hosting at the NULab March 18-19.

We’ll be learning a few things in this workshop:

  1. How to add base maps and other readily-importable data to ArcGIS
  2. How to plot events in ArcGIS using spreadsheet data
  3. How to georeference a historical map in ArcGIS

For that last goal, this step-by-step guide by Kelly Johnston should be your go-to reference. We’ll be following Kelly’s instructions almost to the letter, though we’ll be using different data.

We’ll be using these files for the lab. This tutorial, prepared for my graduate digital humanities class, walks through the same steps we’ll follow, in case you need to review a step here or later:

A few other worthwhile links:

  • The Spatial Humanities site is a useful clearinghouse of both spatial theory and praxis across a range of humanities fields. Kelly Johnston’s step-by-step above is only one of a growing collection of such resources on the Spatial site.
  • The David Rumsey Historical Map Collection. If you want a historical map with which to practice—or, frankly, for your research, this is an excellent first stop. In short, it’s many thousands of historical maps, provided for free. In order to download high-resolution versions of the maps, you must create a (free) account and log in.
  • Neatline is an incredibly robust Omeka plugin that allows you to create spatial exhibits of your collected materials. Check out some of the demos—it’s really phenomenal stuff. We won’t have time to go over Neatline, but one could, for instance, make use of a map georeferenced in ArcGIS as a base map for a Neatline exhibit.
  • Hypercities is another important spatial humanities platform that makes use of Google Earth and allows users to build “deep maps” of spatial data, historical maps, images, video, and text. Check out some of their collections to see what Hypercities can do. The collections around Los Angeles, Berlin, and Rome are particularly robust.

Finally, two spatial nonsequitors: