I wrote the following as part of my preparation for next week’s second meeting of the NHC Summer Institute in Digital Textual Studies next week. The post assumes a modest working understanding of network graphs and their terminology. For a primer on humanities network analysis, see the links for my network analysis workshop or, more specifically, see Scott Weingart’s ongoing series Demystifying Networks, beginning, appropriately enough with his introduction, his second post about degree, and possibly his post on communities.


In previous work in American Literary History, I argued that reprinted nineteenth-century newspaper selections should be considered as authored by the network of periodicals exchanges. Such texts were assemblages, defined by circulation and mutability, that cannot cohere around a single, stable author. As part of this argument, I demonstrated how social network analysis (SNA) methods might employ large-scale data about reprinting to illuminate lines of influence among newspapers during the period. In that early network modeling, I represented individual newspapers from our reprinting data—at the time drawn primarily from the Library of Congress’ Chronicling America collection—as nodes, connected by edges that represented texts printed in common between papers. Those edges were weighted by frequency of shared reprints. The working assumptions behind those models were these: 1.) the fact that two newspapers reprint this or that text in common says very little about their relationship, or lack thereof, during the period and 2.) that when two newspaper printed hundreds, thousands, or even tens of thousands of texts in common, this fact is a strong signal of a potential relationship between them.

A selection from a single cluster in the Viral Data. Each line represents a specific reprint from the larger cluster, which is identified by the ID in the first column. You can browse the cluster data I used for these experiments. These are themselves experimental clusters using a new version of the reprint-detection algorithm, and are not yet suitable for formal publication.

Our data about reprinting in the Viral Texts Project is organized around “clusters”: these are, essentially, enumerative bibliographies of particular texts that circulated in nineteenth-century newspapers, derived computationally through a reprint detection algorithm that we describe more fully in previous publications.1 From these chronologically-ordered lists of witnesses, we derive network structures by tallying how often publications appear in the same clusters. When two publications appear together in a particular cluster, they are considered linked, with an edge of weight 1. Each subsequent time those same publications appear together in other clusters, the weight of their edge increases by 1; ten shared reprints results in a weight of 10, one hundred shared reprints in a weight of 100. Thus the final network data shows strong links between publications that often print the same texts and weaker links between publications that occasionally print the same texts. Continue reading

Screen Shot 2015-11-25 at 10.17.48 PM He would never suggest the immigrants should be prevented from coming to America, would the famous preacher. To say that, of course, would be un-American. This is a nation of immigrants, after all: a free market of ideas political and religious. Though the famous preacher must bravely say what, after all, must be said: these immigrants are different. Their minds are shackled to institutions too unlike our own. They are “un-accustomed to self-government” and would only be pawns for those seeking to undermine our democracy. Indeed, the very tenants of these immigrants’ faith virtually forces them to do their clerics’ bidding and be “easily embodied and wielded by sinister design.” Speaking bluntly (though of course objectively, and resignedly), the famous preacher notes their religion is fundamentally “adverse to liberty.” These immigrants could simply never assimilate to American culture. It’s almost unfair of us to let them try, isn’t it? And while he would never write anything remotely prejudiced, would the famous preacher, isn’t it concerning how the laws of a foreign religion seem to be taking over America? It happened in Boston, he heard. And to be historical for a moment, the famous preacher muses, “the world has never witnessed such a rush of dark minded population from one country to another.” The famous preacher means “dark minded” as “ignorant” or “malicious,” of course: which are just facts, not bigotry. But really aren’t these immigrants “Clouds like the locusts of Egypt…rising from the hills and plains” of foreign lands “to settle down upon our fair fields?” I’m just saying, the famous preacher insists, I’m just saying.

These Catholics have got to be stopped.

This is a pre-print version of this article. The final, edited version appears in American Literary History 27.3 (August 2015). An accompanying methods paper co-written by me, David Smith, and Abby Mullen can be found on the Viral Texts Project site.

I. Introduction[1]

When Louis F. Anderson took over the editorship of the Houma Ceres in 1856, he admitted that he was “not…very distinguished as a ‘knight of the gray goose quill,'” but assured his new readers that “our pen will not lead us into difficulty” because “our ‘principal assistant,’ the scissors, will be called into frequent requisition—believing as we do, that a good selection is always preferable to a bad editorial” (June 28, 1856).[2] Thus, Anderson sums up a set of attitudes toward the production, authorship, and circulation of newspaper content within a system founded on textual borrowing. In the antebellum US context, circulation often substituted for authorship; the authority of the newspaper rested on networks of information exchange that underlay its production. “Nothing but a newspaper can drop the same thought into a thousand minds at the same moment,” Alexis de Tocqueville writes, describing circulation as a technology—like the rail and telegraph—compressing space and time, linking individuals around the nation by “talk[ing] to you briefly every day of the common weal” (111). In both examples, the newspaper’s primary value stems from whom and how it connects. Continue reading

My remarks today will be drawn from my work on the Viral Texts project at Northeastern University. In brief, I’m working with a colleague in computer science to automatically identify the most frequently-reprinted texts in digitized archives of nineteenth-century newspapers. We have thus far drawn from the Library of Congress’ Chronicling America collection, but are currently expanding the corpora from which we are drawing to include magazines, as well as a broader selection of American and transatlantic newspapers. We have identified nearly half a million reprinted texts from the LoC’s nineteenth-century holdings. The majority of these were reprinted only a few times, but a significant minority were reprinted in 50, 100, or even 200 newspapers from this one archive.

We went into this project in search of the literature, such as newspaper poetry, that flourished in a print culture founded on textual sharing and through a deeply hybrid and intertextual medium. In the broadest sense, I hoped to expand our ideas of which writers resonated with nineteenth-century readers and create new bibliographies of popular but critically-overlooked literature.

On this front the project has been promising. For every reprinted Longfellow poem we find many more by authors such as Elizabeth Akers Allen, Isabella Banks, Charles Monroe Dickinson, Colonel Theodore O’Hara, Emily Rebecca Page, Nancy Priest Wakefield, or John Whitaker Watson—or, perhaps even more likely, by an anonymous author. Such poems circulated within a system of exchanges and selection—newspaper editors cut, pasted, and recomposed content from their exchange partners and sent their papers to be similarly aggregated elsewhere.

But recognizably literary genres have been only a small part of the project. One of the most dramatic outcomes of this work thus far has been to highlight the importance of understudied genres of everyday reading and writing within the ecology of nineteenth-century print culture. These species of writing include political news, travel accounts, squibs, scientific reports, inspirational or religious exhortations, temperance narratives, vignettes, self-help guides, trivia, recipes, and even, to borrow a modern Internet term, listicles, all of which juxtaposed with poems, stories, and news on the page of the nineteenth-century paper. As a general (and perhaps unsurprising rule), the most frequently-reprinted pieces are concise, quotable, and widely relatable texts that would have been easy to recontextualize for different newspapers and new audiences—and that could easily fit gaps in the physical newspaper pages, as editors and compositors needed.

My remarks today focus on those genres we might categorize as “information literature”: lists, tables, recipes, scientific reports, trivia columns, and so forth. I want to separate these from news itself, which is certainly a kind of information genre, but which I would mark as stylistically and operationally distinct from the other genres I’ve listed. Here’s one example of information literature, a list of supposed “facts,” primarily about human lives and demographics, which was published under many names in at least 120 different newspapers between 1853 and 1899 (which is approximately one quarter of the nineteenth-century newspapers in Chronicling America). Continue reading

7 Reasons 19th-Century Newspapers Were Actually the Original Buzzfeed

In March 2013 I had the opportunity to talk about the Viral Texts project for the “Breakfasts at Buzzfeed” speaker series. I gave my talk a gimmicky title worthy of the venue, which I was assured they appreciated rather than resented. It was a lively crowd of employees from around the company, and they asked some insightful questions during the Q&A. Here’s the video. I only wander off frame a few times!

Below I’ve copied the (very rough) text of my talk at MLA 2012, as part of the Society for Textual Scholarship‘s “Text:Image – Visual Studies in the English Major” panel. You can download the accompanying slides here.

Today I want to talk about how mapping using global information systems (GIS) software might help us better understand the dynamic world of print culture in the United States before the Civil War—what Meredith McGill calls “the antebellum culture of reprinting.”

At this January’s MLA Convention, I’ll be presenting on The Society for Textual Scholarship‘s sponsored panel, Text:Image; Visual Studies in the English Major (viewing the panel description may require an MLA membership). I’ll discuss “Mapping the Antebellum Culture of Reprinting,” thinking through my experiments with GIS in the past few years, particularly since attending the GIS course at the Digital Humanities Summer Institute this past summer.

So I was thrilled this past week to read William G. Thomas’ talk, “What We Think We Will Build and What We Build in Digital Humanities,” from this year’s Nebraska Digital Workshop, and to learn from the talk about Thomas’ project, Railroads and the Making of Modern America. The project itself is fascinating, and I immediately wondered if some of their data might help me investigate the circulation of “The Celestial Railroad.” I’ve suspected for awhile that Hawthorne’s tale—which satirizes uncritical modernizing through the central image of a railroad—ironically may have spread around the country through the railroad system. Continue reading

Things are moving for “The Celestial Railroad” project. After the slow work of last year—which can be forgiven, I hope, as I was a brand-new faculty member—this year I have two undergraduate assistants helping me transcribe and encode the hundreds of paratexts—the texts that introduced, commented upon, quoted, or invoked what may have been Hawthorne’s most popular early story. We’re building the archive in the background of this website, and I hope to publish most of the “Celestial Railroad” reprints and paratexts this summer.

Which leads me to the question, “What’s next?” I’ve been thinking quite a bit about this over the past year, and have considered a few possible directions for this research. Exploring the extensive reprinting history I’ve uncovered for this one story—a non-canonical story by a hyper-canonical author—has convinced me that similar textual narratives must exist for many stories and poems—both by canonical and by forgotten authors.

