This is a pre-print version of this article. The final, edited version appears in American Literary History 27.3 (August 2015). An accompanying methods paper co-written by me, David Smith, and Abby Mullen can be found on the Viral Texts Project site.
When Louis F. Anderson took over the editorship of the Houma Ceres in 1856, he admitted that he was “not…very distinguished as a ‘knight of the gray goose quill,’” but assured his new readers that “our pen will not lead us into difficulty” because “our ‘principal assistant,’ the scissors, will be called into frequent requisition—believing as we do, that a good selection is always preferable to a bad editorial” (June 28, 1856). Thus, Anderson sums up a set of attitudes toward the production, authorship, and circulation of newspaper content within a system founded on textual borrowing. In the antebellum US context, circulation often substituted for authorship; the authority of the newspaper rested on networks of information exchange that underlay its production. “Nothing but a newspaper can drop the same thought into a thousand minds at the same moment,” Alexis de Tocqueville writes, describing circulation as a technology—like the rail and telegraph—compressing space and time, linking individuals around the nation by “talk[ing] to you briefly every day of the common weal” (111). In both examples, the newspaper’s primary value stems from whom and how it connects.
Antebellum newspaper pages were replete with anonymous or pseudonymous texts, attributed from other papers or merely as “making the rounds.” In such a textual environment, the value of widely reprinted snippets derived from their movement through the exchange system, not the genius of individual creators. Like some viral content online today, which can become noteworthy because of its virality, the system of newspaper exchanges produced a kind of feedback loop, in which texts circulated because of their perceived value to readers while that perceived value was often tied to a given piece’s wide circulation. The social and technological operations of newspaper network often proxied the author function, as the names of source newspapers stood in place of an authorial byline. Through the process of selection and republication, editors appropriated the collective authority of the newspaper system, positioning their publication as one node within larger political, social, denominational, or national networks, and their content as drawn from and contributing to larger conversations across the medium.
The composition and circulation of texts among antebellum newspapers offers a model of authorship that is communal rather than individual, distributed rather than centralized. I propose that an idea of the “network author” accounts for the ways in which meaning and authority accrued to acts of circulation and aggregation across antebellum newspapers. This idea of a network-author extends scholarly notions of reprinting, reauthorship, and the social text by identifying composition in terms of writers, editors, compositors, and readers enmeshed in reciprocal, mutually dynamic relationships of reception, interpretation, and remediation. In theorizing a “network-author” function in antebellum newspapers I mean not to reinscribe the author’s tombstone. Instead, I hope to bring into focus alternative modes of reading and writing that flourished alongside and informed “the more or less constant rise in social prestige” of the term author through the nineteenth-century (Pease 108). While periodical writers were not often accorded the social prestige of authorship—consider Poe’s biting critique of magazine writing in “How to Write a Blackwood Article”—their work nevertheless more thoroughly circulated than many of their literary counterparts’. Given the myriad ways in which literary authors participated in periodical production and reception—as writers, editors, and certainly as readers—such influence could not have run in a single direction.
Ultimately, the frame for this argument must be comparison, in the mode of “comparative textual media,” as N. Katherine Hayles and Jessica Pressman describe the nexus among literary history, digital humanities, and the history of the book. Hayles and Pressman propose “genre conventions can be reconceptualized so they are approached through the ways in which they presuppose and draw on different media functionalities” (Kindle Location 177-178). Here I negotiate primarily between two mediums: nineteenth-century newspapers and the twenty-first-century digitized corpus of those historical newspapers. The digitized book, magazine, or newspaper is not simply a surrogate for the material object, but instead constitutes a new edition of the text with both new affordances—such as the ability to analyze patterns in text strings across the corpus—and limitations—such as the leveling out of scale among objects that, in their printed editions, vary widely in size and format. According to Matthew L. Jockers, the mass of digitized materials now available to literary scholars necessitates a shuttling between macro- and micro-views, as we “strive to understand these things we find interesting in the context of everything else, including a mass of possibly ‘uninteresting’ texts.” For Jockers, the “two scales of analysis . . . should and need to coexist” (8-9). If antebellum circulation was a technology of aggregation and enmeshed social relationships, we can now disambiguate and analyze it—albeit always partially and provisionally—through modern technologies like text mining and visualization. The patterns and models of textuality revealed through computational means return us to the archive with new questions about reprinted texts, their circulation, and the wider system of print culture. By comparing a machine-derived index of reprinted texts to the details of individual newspapers where they appeared, we can work toward a far-reaching understanding of antebellum reprinting across the macro- and micro-scales.
II. The Culture of Reprinting and the Sociology of Texts
The idea that print culture inheres in networks is not new to scholarship in US literature or book history. In “What is the History of Books?” Robert Darnton describes the “life cycle” of printed books as “a communications circuit that runs from the author to the publisher…the printer, the shipper, the bookseller, and the reader” (67), casting literary production as an act deeply embedded in social and economic relationships. D. F. McKenzie urges bibliographers to move away from exclusive “study of the non-symbolic function of signs” and toward study of “the composition, formal design, and transmission of texts by writers, printers, and publishers; their distribution through different communities by wholesalers, retailers, and teachers; their collection and classification by librarians; their meaning for, and…creative regeneration by, readers” (12). In McKenzie’s framing of bibliography as “the sociology of texts,” identifying and describing networks of relationship emerges as the center of the field’s activity. Similar concern for the social text undergirds a crucial turn in US literary history, as scholars have reconsidered notions of authorship and literary property within what Meredith McGill deems “the antebellum culture of reprinting,” seeking to “recover…the vibrancy and importance of the literature that thrived under conditions of decentralized mass production” (3). Lara Langer Cohen, for instance, reads newspaper columnist and novelist Fanny Fern’s “schizophrenic writing style” as a sign of her mastery of the newspaper medium, which “entails that writers exist iteratively, while bringing them in regular contact with their own circulation” (76, 71). For Cohen, Fanny Fern’s unoriginality constitutes, rather than distracts from, a capacious authorial identity, which includes in its ambit not only those pieces Fern composed, but also the many remediations and imitations that circulated beyond her control. These and similar studies trouble romantic notions of the author, pointing instead to complex configurations of work, replication, revision, and attribution across media, including periodical and ephemeral forms such as the newspaper, magazine, and pamphlet.
Even as scholars have ventured beyond the book to think more capaciously about antebellum print culture and authorship, however, their arguments have remained largely bound up with exemplary authors: Fern, Lippard, Yonge, or Hawthorne, Poe, even Dickens. For McGill, this paradox develops in large part out of practical constraints: “[t]exts that circulated without authors’ names attached frequently remain unindexed and untraceable, as do authored texts that are published without their authors’ knowledge or consent.” It is precisely the hyper-canonical status of Hawthorne, Poe, and Dickens that allows McGill “to begin to recover the patterns of reprinting of their texts” (2, 42). Because these authors have well-developed bibliographies, their social texts can be recovered and studied. But we also know that the print culture of the antebellum period was far more textually diverse than our author-centered bibliographies reflect:
Newspapers contained more than battle and political news. They were full of the fragmented or “morselized” information that became particularly popular in the press of the second half of the nineteenth century. Papers provided troves of tidbits and factoids—household hints, information about word origins, geographic one-liners, and scientific or historical or agricultural items. Columns of miscellany asserted the preciousness of facts and raw information. (Garvey 7)
While scholars now recognize this newspaper literature as an important marker of nineteenth-century readers’ interests and priorities, the very “conditions that permitted newspapers and periodicals to play such a role” as a chief means of discourse—”their seriality, abundance, ephemerality, diversity, heterogeneity—posed problems for those who wanted to access their contents” (Mussell 2). When every nineteenth-century newspaper brims with original and reprinted content of all kinds, it is difficult to know where to even begin studying that content. Ellen Gruber Garvey offers one compelling solution to this problem, focusing on the morsels clipped and saved by nineteenth-century readers who through scrapbooking “evinced their trust that these fragments of knowledge were important, and faith that their value would become evident in time” (7). By using what readers saved as her principle of selection, Garvey can identify the “‘morselized’ information” worthy of analysis from the billions printed in the period and demonstrate how readers actively engaged with newspaper content we might otherwise dismiss as ephemeral.
This article proposes a complementary approach, arguing that computational methods are uniquely suited to helping us understand reprinting practices at scale. I draw on Northeastern University’s Viral Texts Project to argue that circulation itself was an essential, organizing technology that mediated experiences of textual production and reception during the antebellum period. To trace the “morselized” information circulating in nineteenth-century newspapers, the Viral Texts Project searches not for individual texts or authors and instead employs methods from natural language processing and computational linguistics to read across the Chronicling America newspaper corpus. Such an approach takes up the challenge of “distant reading” with an aim not to replace textual evidence with graphs, maps, or trees, but to uncover and model new sets of evidence difficult to discern at the level of the individual newspaper. In order to focus here on the interpretive consequences of this work, I have co-written with David Smith and Abby Mullen an article describing our methods in more technical detail. In my pages here, I write in more conceptual terms about how our methods meet literary-historical research challenges and what new insights they provide into antebellum newspaper production and reception.
Like their physical counterparts, most digital archives hide more than they reveal, as keyword searches require prior knowledge of the texts to be discovered and can lead to evidentiary excess. The blinders of search access prove surprisingly crippling because they tend to reinforce existing suppositions—on, say, the dominance of a text that is canonical today—while leaving undiscovered more popular texts that might reveal precisely what we have failed to understand about popular opinion, readings habits, and public debate in the period. Even successful searches in large-scale archives can be problematic. As Maurice S. Lee points out in a recent roundtable on “Evidence and the Archive” in J19, “claims about evidentiary relationships between texts are hard to falsify amid intertextual promiscuity, a dynamic that becomes increasingly irresistible as search engines make connections easier to claim” (164). Lee’s point is well taken, so far as it goes, but I would offer that keyword search is only one, and perhaps not even the most compelling, strategy for accessing the digitized archive, particularly when one’s aim is precisely to recover the intertextual promiscuity of C19 newspapers.
Repetition—and circulation is a kind of textual repetition with a difference—is a salient pattern that the digitized archive makes more readily discernible. If the primary challenge facing scholars interested in nineteenth-century reprinting is that the newspapers and magazines are “unindexed,” then an algorithm can help build indices useful for approaching these materials. In the Viral Texts Project our algorithm privileges neither authors nor particular topics, instead attempting to automatically discover the texts most frequently reprinted across nineteenth-century newspapers. The resulting data can be considered, in essence, a substantial set of enumerative bibliographies of popular newspaper literature. As a corpus, this index offers a new view into genres and modes of authorship inadequately addressed by literary historians.
Figure 1: This visualization maps the bibliography of one text uncovered in the Library of Congress' Chronicling America newspaper database for the Viral Texts Project. This "Eloquent Extract" reflects on the mortality of humanity in the light of eternity, and is often (but not always) attributed to Nashville editor George D. Prentice. The modern states are shaded here to indicate the extent to which their historical newspapers are available in open-access archives.
Using this method to analyze only newspapers in the Chronicling America collection that were originally published before 1861, we have uncovered approximately 40,000 reprinted texts. A majority of these were reprinted only two or three times, but a significant minority were reprinted in twenty or more newspapers from the Chronicling America collection, and were typically distributed across the country, from New England to South Carolina to Nebraska, and even Hawaii. The “textual clusters” produced by Viral Texts’ algorithm allow us to see that an “Eloquent Extract” of George G. Prentice’s religious reflections was reprinted at least 50 times, Queen Victoria’s message to President James Buchanan at the completion of the Transatlantic Cable was reprinted at least 49 times, and the poem “The Inquiry” by Scottish poet Charles MacKay was reprinted at least 33 times. These results from Chronicling America seem strongly to indicate wider popularity, as our largest clusters of reprinted texts grow considerably when we search for those texts in other periodicals archives.
Figure 2: This visualization maps the bibliography of the same snippet, "Eloquent Extract," across Chronicling America and a range of commercial archives of historical newspapers and magazines. We have hand-tested (through iterative search terms) many of our automatically derived clusters from Chronicling America in a range of other archives. Thus far, widely-reprinted clusters from our findings seem, as in this case, to be representative, as we are very likely to find the same pieces widely-reprinted in newspapers and magazines collected in other digital archives. In the case of "Eloquent Extract," we automatically identified 50 reprints in Chronicling America, variously titled "The Broken Heart," "A Beautiful Extract," "A Beautiful Reflection," "After Life," "An Eloquent Passage," and "Man's Immortality" (and often attributed in reprintings to Bulwer). Using manual searches, we found 20 full or partial reprintings of this same piece in Readex's America's Historical Newspapers, 66 in ProQuest's American Periodicals Series Online, and 42 in Google Books' nineteenth-century holdings. Some of these duplicate our findings in Chronicling America, but most are new witnesses.
III. Information Literature in Antebellum Newspapers
While frequency of reprinting is not a perfect proxy for popularity or cultural importance, when taken as the locus of scholarly inquiry, it produces a hybrid, decentered set of bibliographies that highlight the prevalence of understudied or ignored genres of everyday reading and writing within antebellum print culture. These species of writing include political news, vignettes, travel accounts, squibs, scientific reports, temperance narratives, self-help guides, trivia, recipes, inspirational or religious exhortations, and even, to borrow a modern Internet term, listicles. That final, prototypical Internet genre is prefigured by pieces in our study such as “Maxims to Guide a Young Man,” a list of practical, moral, and spiritual aphorisms that was reprinted in at least 35 of the pre-1861 Chronicling America newspapers, and that gave young men practical advice in how to succeed: “Keep good company or none,” “Make few promises,” “Drink no kind of intoxicating liquors,” and—perhaps less relatable to modern readers—to “Have no very intimate friends.” Some newspapers appended to the “Maxims” piece personal endorsements from the editor, as in this preface from Cincinnati’s Daily Press: “The following has been handed us by a friend who has carried it in his purse for many years, as a sort of talisman and who regards much of his success in life as the result of a strict adherence to the advice it contains.” In many ways this and other listicles exemplify our early findings: they are concise, quotable, and widely relatable texts that would have been easy to recontextualize for different newspapers and new audiences—and that could easily be made to fit different locations on the newspaper page, as editors and their compositors needed.
No scholar of antebellum periodicals would be shocked at the genres mentioned above above, given how they pervade every periodical of the day. Within the context of a particular newspaper, however, such pieces often seem like quirky one-offs. Even when it is clear a particular piece was reprinted, uneven citation practices obscure the full extent of most snippets’ journeys through the newspaper system. By identifying the most widely reprinted texts across a large corpus of antebellum newspapers, however, we can also identify the texts, topics, and genres that had significant cultural purchase, particularly among the middle-class readers that consumed newspapers in increasing quantities through the nineteenth century. Everyday newspaper clippings reflect ideas of value embedded in contemporaneous interest and use, not the longer workings of canonization. With rare exception—like John Greenleaf Whittier’s pastoral poem “The Huskers” (reprinted in at least 29 newspapers)—the most widespread texts we have identified are either anonymously authored or attributed to authors unfamiliar to most nineteenth-century literary scholars. Our most frequently reprinted texts are typically short—a few paragraphs—and offer little context about their aims or composition. These newspaper genres constituted an important segment of antebellum reading and writing practices, but have been obscured by literary historians’ focus on overtly literary genres.
One could easily dismiss such clippings as the most ephemeral ephemera: disposable texts created for a disposable medium, useful filler generated by editors desperate to compose a daily or weekly newspaper. That these texts served such practical and material needs is clear. As Garvey notes, “[o]n large papers, a special ‘exchange editor’ went through other papers for material,” a process that helped local newspapers to expand “by yoking together scattered producers who shared labor and resources by sending their products to one another for free use” (Scissors 29-31). Like modern blogs that use aggregation to produce regular content despite tiny editorial staffs, nineteenth-century newspaper editors exploited systemic reprinting to fill columns. But such material uses do not foreclose the possibility that antebellum readers valued newspaper literature.
Much like the sentimental literature brought to our attention in the past decades by feminist scholars from Jane Tompkins to Amanda Claybaugh, many of the most widely reprinted newspaper genres aspired toward everyday, practical use. Indeed, the second-most frequently reprinted piece in our pre-1861 findings is a recipe for gum arabic starch, which appears first in our findings in the Burlington Free Press (January 28, 1853). A slightly elongated version of this popular recipe in the Nashville Union and American (June 30,1853) promises it will “impart to shirt bosoms, collars, and other fabrics that fine and beautiful gloss observable on new linens” and thus “should have a place in the domestic scrap-book of every woman who prides herself upon her capacity as a housewife and the neatness of her own, her husband’s, and family’s dress.” The Union’s introduction makes a snide dig at women who would ignore the recipe’s instruction: “if she does not take pride in these things her husband is an unfortunate man.” In other words, this paragraph-long, single-sentence introduction to a recipe reinscribes conventional attitudes about proper gender roles within the home and family. Notably, “Starching Linen” is cast as valuable information worth saving, not ephemera. The article “should” be cut out, to have a place “in the domestic scrap-book of every woman.” In its few lines, this popular clipping demonstrates how widely reprinted periodical texts reflect the values of the larger culture: both the material and informational values associated with such snippets and the sociopolitical values embedded in their content and rhetoric. That such pieces often ended up in readers’ scrapbooks testifies that readers valued the “the preciousness of facts and raw information” (Garvey 7).
“Starching Linen” exemplifies perhaps the most pervasive category of frequently reprinted texts uncovered in the Viral Texts project, which I call “information literature”: lists, tables, recipes, scientific reports, trivia columns, and so forth. I separate these from news itself, which is a kind of information genre, but which is stylistically and operationally distinct from the other information genres. In many ways, information literature seems least explicable from the standpoint of literary history, but the prevalence of such texts alone leads me to argue they are important to a full understanding of nineteenth-century epistemologies. Depending on precisely how one categorizes the texts in the top 100 clusters, information literature accounts for 20-25% of the top 100 most frequently reprinted pieces in our study. The popularity of these snippets no doubt stems in part from their malleability—a squib of interesting statistics requires little to no contextual prose, and could help a compositor fill a small gap on their page. These pieces also instantiate the newspaper’s emerging role as an information broker in nineteenth-century America. We might think of the newspaper’s information literature as a kind of serialized and communally authored compendium of useful knowledge, drawing from and contributing to related genres of the book such as the journal or encyclopedia.
In a piece on the “high medical properties” of the tomato, for instance, newspapers cited a “Dr. Bennett, a medical professor in one of our colleges” and several European professors expounding on the health benefits of the fruit before describing several “methods for preparing this article for diet, which adds to the variety of taste and renders it…agreeable to every individual.” This article operates at two informational registers, first listing scientific and medical facts about the tomato (“1st. That it …is one of the most powerful deobstruents [sic] of the Material Medica, and that in all of those affections [sic] of the liver and other organs where calomel is indicated, it is probably the most effective and least harmful remedial agent known in the profession”) and then offering practical advice for incorporating the fruit into regular domestic routines (“Tomato Omele [sic]. When stewed, beat up a half dozen new laid eggs, the yolk and white separate; when each are well beaten, mix them with the tomato—put them in a pan and beat them up; you have a fine omelet”). A similarly practical table was printed for “some of our farming friends” and listed how many pounds were in a bushel of various agricultural products (“Of wheat, sixty pounds” or “Of dried peaches, thirty-three pounds”).
Other information literature offered newspaper readers facts with even less context. One squib, often headed “Interesting Statistics,” claimed to come from a “gentleman claiming to be a ‘friend of the human race’” who “keeps the run of facts, figures, and babies.” The statistics that follow ramble across a range of demographic topics, including the diversity of human languages (“The whole number of languages spoken in the world amount to 3,064; 587 in Europe, 936 in Asia…”), life expectancy (“The average of human life is about 33 years. One quarter part die previous to the age of 7 years…”), marriage (“Marriages are more frequent after equinoxes”), and military readiness (“The number of men capable of bearing arms is estimated at one fourth of the population”). Another widely reprinted column, often headed with the tautological “Ancient Antiquities,” listed in single sentences the dimensions of ancient cities (“Ninevah was 15 miles by 9, and 40 round, with walls 100 feet high, and thick enough for three chariots”) and monuments (“The temple of Diana at Ephesus, was 425 feet high, to support the roof, it was 200 years in building”). The prevalence of such columns of raw facts in Viral Texts’ findings indicates that such pieces were exceedingly common in antebellum newspapers. While less practical than recipes, these species of information literature contributed to the broad cultural literacy of newspaper readers.
In their introduction to Raw Data is an Oxymoron, Lisa Gitelman and Virginia Jackson argue, “Data need to be imagined as data to exist and function as such, and the imagination of data entails an interpretive base” (3). The many lists, tables, recipes, scientific reports, and related genres in antebellum newspapers are catalysts for imagining the newspaper as data, as information. We might read the frequent reprinting of information literature as a piling up of facts across a range of genres that register at different levels of what we might call empirical truth. The exchange and republication of information literature through the newspaper network—particularly when those acts of exchange were staged through paper-to-paper attribution—built up an idea of newspapers’ citability. In other words, pervasive reprinting of information literature cultivated an idea of the newspaper as a knowledge medium which can itself be cited as an authority, rather than deriving its authority entirely from the book-based genres from which it often drew.
The information literature in American newspapers stemmed from and contributed to the industrialization of knowledge during the nineteenth-century. In large part, antebellum newspaper reprinting privileged texts for their edification or usefulness to readers, not their originality, an observation which aligns with Franco Moretti’s positioning of “usefulness” as a central value to middle-class culture in the nineteenth-century (Bourgeois Kindle Location 232). Useful knowledge can be operationalized. The informational snippets in newspapers operate in diverse ways. In some cases, they direct physical work, as with the recipe for starch. In other cases, they provide functional signals of broad education, as in the lists of statistics or historical tidbits. Such pieces are useful, in other words, as aids to the rhetoric and appropriate interests of middle-class social and professional life.
In this latter function I trace a connection with other print genres of miscellany, not new to the nineteenth century but increasingly industrialized and available to the middle class, such as the dictionary or the encyclopedia. Chambers’s Information for the People, for instance, sought to be “the poor man’s cyclopedia” through cheap print and serial publication. Information for the People was widely successful, appearing in multiple editions in the UK and US through the nineteenth century, and was touted as “the most striking example yet given of the powers of the press in diffusing useful knowledge.” The encyclopedia would through the nineteenth and early twentieth centuries become a increasingly accessible marker of upwardly-mobile, middle class family life, and in the information genres of the newspaper we can identify a broad attempt to position that medium, too, as an accessible avenue of enhancing one’s (or one’s children’s) social position. In the article “Newspapers,” for instance, we can see editors explicitly claiming their medium as an aid to students. An anonymous writer reports that “The Hon. Judge Longstreet says” he remembers “what a marked difference there was between those of my schoolmates who had, and those who had not access to newspapers.” Newspaper readers, the piece claims, were “always decidedly superior…in debate and composition” because “they had command of more facts” drawn from “a history of current events, as well as curious and interesting miscellany.” Here the news itself becomes “a history of current events” (my emphasis), while the newspaper’s snippets are both “curious” and “interesting” supplements to education.
One of the longest of such advocacy pieces, “The Influence of a Newspaper,” purports also to be the product of a “school teacher, who has been engaged a long time in his profession.” This schoolteacher attests, “I have found it to be a universal fact, without exception, that those scholars of both sexes, and of all ages, who have had access to the newspapers at home, when compared with those who are not, are” better students in nearly every way. Indeed, the piece then becomes a list of the newspaper readers’ virtues; they are first “[b]etter readers,” second “better spellers,” third more knowledgeable about Geography, fourth “better Grammarians,” and so forth. Another article, “Ladies Should Read the Newspaper,” extends these benefits to women, claiming “[i]t is a great mistake to keep a young lady’s time and attention devoted only to the fashionable literature of the day.” Instead, the piece claims, women should “read the newspaper and become familiar with…the present world, to know what it is and improve the condition of it.” Such familiarity will allow the newspaper-reading woman to “have an intelligent conversation concerning the mental, moral, political, and religious improvement of our times.” This piece gestures toward a new and wider role for newspapers in the public sphere, and ends with an exhortation, “Let the whole family—men, women, and children—read the newspapers,” suggesting that doing so makes for better citizens. In such pieces, we can see editors and newspaper writers attempting to account the newspaper, with its ever up-to-date “facts,” as a contributor to both knowledge and morality—it is cast as the new virtuous medium that can build an educated and morally upstanding citizenry.
I cannot in this article discuss all the popular genres reflected in the Viral Texts corpus. Other prominent threads include poems and essays about rural and farming life; regional humor; travel narratives; political news and opinion pieces, particularly around the slavery debate in the Kansas and Nebraska territories; temperance tales; quasi-truthful anecdotes or vignettes; and literary excerpts from longer stories, poems, and novels. In my brief discussion of information literature, I have signaled how consideration of these texts as a corpus exposes patterns of everyday periodical writing and reading. The specific examples I quote are interesting less as independent texts than as exemplars of trends, allowing us to begin thinking about how antebellum newspapers constructed ideas of information across publications and over time. These and similar bibliographies trouble accounts of antebellum print culture that cohere primarily around literary genres such as fiction and poetry. As a corpus, these snippets offer a useful vantage for understanding what circulated, and perhaps thus what signified, within the largest mass medium of the time.
IV. The Network Author
As the prevalence of information literature linked newspapers with other antebellum information genres, we might also compare the philosophies undergirding the composition of dictionaries and encyclopedias—which are also genres of miscellany—and those undergirding nineteenth-century newspaper exchanges. In “Wikipedia and Encyclopedic Production,” Joseph Reagle and Jeff Loveland describe copying among encyclopedias in the eighteenth and nineteenth centuries as a normative practice. They note, for instance, that prominent eighteenth-century encyclopedist Ephraim Chambers declared it “idle to pretend any thing of Property in Things of this Nature.” In this quote we might identify a precursor to notions of “raw data”— an idea that the encyclopedia’s articles merely report “facts” that exist independently of the writer and so cannot be owned. Reagle and Loveland’s claim that American publishers would justify their piracy of British encyclopedias as a service to their “knowledge-hungry” compatriots aligns with Meredith McGill’s arguments about the nineteenth-century “culture of reprinting” in the US, in which unregulated reprinting was defended as necessary to a democratic and equal society (1299). Such literature was composed incrementally, by a community of writers and editors in a network.
Such communally composed texts challenge scholars’ continued focus on the individual author as the organizing principle of antebellum print culture. Though we do not yet have precise statistics, a preponderance of the pieces we have uncovered in Viral Texts circulated without an authorial name attached. Such anonymity was endemic to antebellum systems of reprinting. “Printers’ Proverbs,” an evocative, satirical (and itself widely reprinted) snippet, highlights the complex relationships among writing, editing, and reprinting during the period. Proverb three advises printers in faux-King James English that “[i]t is not fit that thou should ask of him [the newspaper editor] who is the author of an article upon subjects of public concernment; for his duty requires him to keep such things unto himself.” However playfully, the very idea of authorship is here kept secret, subsumed within a system that includes (presumably) a writer, an editor, and a printer, as well as the compositors and readers mentioned elsewhere in the proverbs. Elizabeth Maddock Dillon has referred to the “principles of assemblage” replacing the author function in these articles. By forcing our attention to these principles of assemblage, these texts uniquely highlight the essential interplay of literary content and its media.
Viral Texts’ large-scale bibliography of newspaper literature offers an alternative model of antebellum authorship: the network author. As a frame, the network author allows us to speak of “textual clusters”—loose bibliographies of composition, recomposition, and even responses, like parodies—as distinct textual events that can be studied and compared without an author as the central organizing trope. The idea of the network author extends MacKenzie’s notion of the social text, focusing on how texts circulate and are used rather than on their creation. The network-authored text, in short, contains multitudes, comprising both traditional bibliographic witnesses and a host of “reception items” that speak to a given text’s social life and rhetorical power. To read the newspaper was to tap into an “imagined community,” as Benedict Anderson argues. But where Anderson writes of “a specific imagined world of vernacular readers” (63), I want to suggest too that antebellum newspaper readers were accessing an imagined, collective authorial presence. Each reprinted text linked those to read it to another newspaper and other readers, within a complex system of millions of such links.
This network effect can be usefully modeled at the macro scale. Using the 40,000+ antebellum reprints from the Viral Texts project, , I have graphed the connections among pre-1861 newspapers in the Chronicling America archive using methods derived from Social Network Analysis (SNA). If we think of these reprinted texts as a kind of common property among periodicals, then we can trace the movement and expression of that common property—who was trading with who, how frequently, and in what order. Shared texts reveal not only the topics of interest to antebellum readers, then, but also the technical and social structures that enabled their exchange. During the nineteenth century, “temporal and spatial divides were being managed and collapsed in order to engineer mass experiences with others (elsewhere)” (Loughran 347). For Leon Jackson, such connections were fostered first among individual editors through newspaper exchange networks and then more “commercially and impersonally” through technologies such as telegraphy and news agencies (48, 123, 140). By considering reprinting practices at a macro scale, we can begin to model the imagined and practical networks that gave force to reprinted newspaper pieces. In particular, SNA might help us give flesh (or perhaps paper) to the idea of the network author by highlighting the communities of publications that most actively shaped textual exchange during the period.
Figures 3 and 4: This network graph and detail illustrate relationships and influence among Chronicling America newspapers prior to 1861. The circles (nodes) represent individual newspapers. The circles and newspaper names are larger based on the weighted degree of that newspaper, which in this case is determined by how many reprinted texts they share in common with other newspapers in the network.. The lines between the newspapers (the edges) represent shared reprints. The thicker a given line is, the more texts the two newspapers it connects share in common. In this graph, even the thinnest lines represent more than one hundred shared reprints between publications; the thickest lines represent thousands of shared texts before 1861. The shading of each node indicates the centrality of that node, or its influence within the network. These graphs were created using the Gephi open graph visualization tool. An interactive version of this graph can be found on the Viral Texts Project website.
That two newspapers reprinted a few articles in common does not tell us much, but when two newspapers printed hundreds or even thousands of texts in common, we might infer a connection that demands closer archival scrutiny. A network graph such as the one in Figures 3 and 4 visualizes all of the texts shared between and among publications. The weight of connection between any two newspapers in the graph is determined by how many reprinted texts they share in common in our findings. Modeling our corpus of reprinted texts as a network allows reprinted texts with both wide and more circumscribed circulations to add to our understanding of the newspaper reprinting system, as even a text printed between only two newspapers increases (albeit slightly) the weight of those papers’ connection in the graph. Looking at the data underlying this visualization, we can see that the edge with the highest weight—which means, in this case, the most shared texts identified before 1861—runs between the Nashville Union and Daily Dispatch (Richmond, Virginia) newspaper families. Other strong edges run between the New-York Daily Tribune, Indiana State Sentinel (Indianapolis, Indiana), and Ottawa Free Trader (Ottawa, Illinois) families. Surprisingly, Brownlow’s Knoxville Whig has the highest betweenness centrality in this network, which means that paths between other nodes frequently pass through it. The newspapers with the next-highest betweenness centrality are the Vermont Watchman (Montpelier, Vermont), Athens Post (Athens, Ohio), and the Evening Star (Washington, D.C.). In terms of our reprinting data, then, these papers might be thought of as information brokers. While not necessarily the newspapers in which popular newspaper literature originated, popular newspaper snippets were likely to have circulated through these newspapers and thus out to the publications on their exchange lists.
|Weighted Degree||Betweenness Centrality||Eigenvector Centrality|
|1||New-York Daily Tribune||Brownlow's Knoxville Whig||Evening Star|
|2||Nashville Union||Vermont Watchman||Mountain Sentinel|
|3||Daily Dispatch||Athens Post||Nashville Patriot|
|4||Ottawa Free Trader||Evening Star||Plymouth Banner|
|5||Glasgow Weekly Times||Perrysburg Journal||Vermont Watchman|
|6||Sunbury American||Edgefield Advertiser||Ottawa Free Trader|
|7||Athens Post||Home Journal||Democratic Banner|
|8||Edgefield Advertiser||Democratic Banner||Sunbury American|
|9||Jeffersonian Republican||Sunbury American||Burlington Free Press|
|10||Raftsman's Journal||Burlington Free Press||Athens Post|
Figure 5: Newspaper families from the Chronicling America archive sorted by Degree, Betweenness Centrality, and Eigenvector Centrality.
This preliminary network modeling has already suggested new ideas about US print culture before the Civil War. One small but telling example of how scale might shift scholarly attention lies in how our network analyses of reprinting have pointed toward newspapers in understudied cities in the South and Midwest as important brokers of textual exchange during the period and more influential than scholarship has registered. Because Chronicling America comprises smaller, state-level digitization efforts, its holdings reflect a wider notion of antebellum publishing than much scholarship, which has focused disproportionately on publishing centers, like Boston, New York, and Philadelphia. Yet when our reprinting data is modeled as a network, it is striking how papers across the US, including the South and Midwest (then West), are revealed as central nodes in the wider network. For instance, the newspaper families with the highest degree in this network—which is a measure of how many connections to and from a given node—include the New York Daily Tribune, Nashville Union, and Daily Dispatch, but also the Glasgow Weekly Times (Glasgow, Missouri), Sunbury American (Sunbury, Pennsylvania), and Edgefield Advertiser (Edgefield, South Carolina). Partial as these findings may be, given the uneven coverage of the Chronicling America newspaper data, they still suggest that influence across the nineteenth-century print network was far more distributed than scholars have typically assumed and that print culture was much more diffuse and decentered.
These ideas of a diffuse print culture are bolstered by the striking density of this network. Rather than clustering into distinct communities of closely aligned publications, the entire network clusters quite closely together. The visualizations above so much resemble hairballs because most of the network’s nodes connect to most of the network’s other nodes, which creates a strong attractive force in SNA graphs. This graph has a very low eccentricity, which refers to how far each node in the graph is from every other node. In this graph, every newspaper family is three steps or fewer removed from every other newspaper family in the graph. While there are some identifiable groups of even denser alignment, there is a high level of connection—which, in this case, means a large number of common texts—among publications across the entire network. I cannot, of course, make claims of direct textual transmission from these connections in the network graph. However, the graph does illustrate how thoroughly reprinted texts permeated the newspaper system, from the largest urban print centers to small rural towns.
A similar density can be demonstrated by measuring the Eigenvector centrality of the network’s nodes. Eigenvector centrality measures the influence of a given node within the network based on its connections with other high-scoring nodes, similar to the way web pages are ranked in Google’s search results. By this measure, the newspapers with the highest centrality in Viral Texts’ findings are the Evening Star, Mountain Sentinel (Ebensburg, Pennsylvania), Nashville Patriot, and Plymouth Banner (Plymouth, Indiana)—but only by a tiny amount. In other words, most of the newspapers in this network are roughly as influential as most of the other newspapers, because they are all so frequently connected by shared texts. Such density is telling because of the comparative sparseness of our newspaper data. The Chronicling America newspapers were digitized through distinct state-level grants, not under a national plan to digitize closely related publications. Network graphs of reprinting both within and across those state collections show significant cross-publication among the majority of antebellum newspapers, even when the source newspapers are chosen, if not at random, then at least without an overarching plan. In other words, the dense connections between these newspapers seem organic to the newspapers themselves, not an artifact of selection, particularly when those connections bridge the state boundaries that determined modern digitization efforts.
I would suggest that these SNA network graphs complement, from the macro scale, notions about reprinting and authority we find in editors’ statements and reprinted pieces. Antebellum editors composed their newspapers with both “scissors and the quill,” borrowing from and contributing to regional, political, religious, and even national newspaper networks. Editors and their readers recognized their individual publications as nodes within those larger networks. Editors frequently preceded (or immediately followed) reprinted content with an attribution of the clipping’s source, naming another publication where we might expect an author’s byline. When a reprinted piece was marked, for instance, as taken “from the Nashville Union,” or “New York Daily Tribune,” those attributions signaled for readers the reach and connectedness of their newspaper, while the authority of the reprinted piece itself—its truth, or its usefulness, or its entertainment value—was vested in its circulation.
While the network graph reflects a modern idea of how relationships can be discovered and expressed through data, I also suggest that antebellum readers came to their newspapers with a notion of connectedness not unlike that expressed in such graphs. Antebellum readers neither expected nor particularly prized original content in their newspapers, valuing instead selection and aggregation as editorial acts connecting them to a wider public sphere. Recall my epigraph from Louis F. Anderson “that a good selection is always preferable to a bad editorial” (June 28, 1856). For Anderson, reprinting is a method for bringing stronger writing into his newspaper than he could produce on his own. In this way, Anderson situates himself as a savvy curator and aggregator, lauding the collective wisdom of the newspaper network as superior to his individual judgment and ability.
Antebellum editors frequently touted careful selection over original composition through both serious and satirical commentary on their medium. Cheeky complaints about stolen scissors appear in many newspapers. In “A Scissorless Editor,” the theft of an editor’s scissors is lamented as “a sad dilemma,” because “[a]n editor unscissored is something like a dragoon unhorsed,” while a squib in the Edgefield Advertiser ran in full: “The Cornered Editor.—’Oh Jerusalem! here’s a nice fix! An original article to write, and somebody’s stolen the scissors!’” (February 19, 1852). In a more serious vein, the Fremont Journal claimed, “[d]eprive an editor of his exchanges, shut off his mails for a week, and you take away from him his very sustenance, and withdraw from him all that makes his paper interesting.” While privileging reprinted content over original and “local news,” this piece points to the technologies of circulation that sustained local newspapers, who seemed to thrive under a “general ‘reciprocity treaty’…so that each may assist the other in collecting intelligence and together circulate the vast amount of news, of politics and literature, that circulates through the thousand columns of the periodical press over the land.” Because “[s]cissors and paste contribute a vast amount to the pleasure and profit of the million newspaper readers of our country,” they “should at least be entitled to their degree of praise.” Editors frequently asserted that “a newspaper is not to be judged so much by the amount of original matter it contains as by its selections” and that the “editing of a paper consists not in long editorials as much as in a diversity of good selections.” The Loudon Free Press reprinted a piece from the “veteran editor of the National Intelligencer” which says, “the mere writing part of editing a paper, is but a small portion of the work.” Instead, the “time employed in selecting, is far more important, and the tact of a good editor is better known by his selection than anything else” (October 27, 1852).
By praising “scissors” as at least equal to the “quill,” such pieces lionize a textual practice in which a collective of editors “together circulate” both news and literature. Indeed, in reflections like those above a newspaper editor’s skill is appraised by how he negotiates the larger network of newspapers: what he borrows and from whom. Circulation and aggregation are marked as creative acts equal to writing. In the antebellum US, then, reprinted texts accreted both content and authority through circulation. For antebellum newspaper readers, a cloud of interdependent newspapers—not unlike a modern network graph—underlay their reading experience, which was shaped by the continuous circulation of content to and from their local newspaper of choice. The network-authored newspaper was an assemblage of valuable, morselized information composed and experienced in community.
Considered at scale, reprinted newspaper texts offer a newly expansive view of the field of antebellum literary production. A macroanalytic approach to newspapers does not in this case seek to change entirely the types of literary-historical evidence we value, but it does expose a distinct corpus of widely-reprinted texts difficult to disambiguate from the archive—print or digital—without computational assistance. This new corpus foregrounds understudied but historically important newspaper genres such as the listicle or recipe and by consequence shifts our view of more familiar literature. Widely reprinted newspaper selections also exemplify a collective idea of authorship in newspaper culture, which valued savvy aggregation over original composition. Computational approaches to the periodicals archive allow us to discern such trends, and to begin making sense of textual circulation and influence across the network of print production. The patterns and models of textuality revealed through computational means, however, point us back toward the archive, suggesting new questions about reprinted texts, their circulation, and the wider system of print culture that demand further study at both the macro and micro scales.
Ultimately, both this article and the larger Viral Texts project from which it stems demonstrate the promise of the digitized archive for transformative literary historical research. While wider access to cultural materials is certainly a laudable outcome of the past decades of digitization, the digitized archive cannot only provide more and bigger shelves for us to browse in essentially the same ways we have approached the physical archives. A poor facsimile of the physical artifact from which it is derived, the digitized newspaper lacks the heft, crinkle, and must of those papers, and thus levels a diverse set of publications into a common interface and reading experience.
The digitized archive, however, holds out hope—mostly unrealized—to become, in Jerome McGann’s words: “a research tool with greater powers of consciousness” that can be read as a newspaper or computationally, where it may “generate readerly views of its information that cannot be had in the codex” form (13). The essential question for literary historians is not whether computational methods of reading can work, but what kinds of patterns would provide substantive insight that can only be had at the scale of corpora? Anyone who has been underwhelmed by a word cloud knows not all patterns signify as fully or convincingly. As Jockers admits, “Word- frequency lists, concordances, and keyword-in-context (KWIC) lists,” which have been the primary computational tools for working with texts until quite recently, “hardly satiate the appetite for more. These tools only scratch the surface in terms of the infinite ways we might read, access, and make meaning of text.” I agree that it is only quite recently, after decades of corpora digitization and exploratory computational work, that we have reached the point “where enough text and literature have been encoded to both allow and, indeed, force us to ask an entirely new set of questions about literature and the literary record” (4). I would suggest that it is also only quite recently, as we have been able to consider digital corpora from the other side of that tipping point, that scholars in literature, history, and related fields can begin to hone in on essential literary-historical questions that could only be answered with computational assistance. We are living, in other words, in interesting times, when a critical mass of humanities data is available to scholars increasingly primed to make good sense of it. Reprinting is one pattern that the digitized archive makes more salient and knowable, even as it offers literary historians a model of the kinds of questions we might ask of digitized corpora in future studies.
- Anderson, Benedict, Imagined Communities: Reflections on the Origin and Spread of Nationalism, New York (Verso, 1991).
- Bastian M., S. Heymann S., and M. Jacomy, “Gephi: an open source software for exploring and manipulating networks,” International AAAI Conference on Weblogs and Social Media, 2009.
- Cordell, Ryan, “‘Taken Possession of’: The Reprinting and Reauthorship of Hawthorne's "Celestial Railroad" in the Antebellum Religious Press,” Digital Humanities Quarterly 7.1 (2013), http://www.digitalhumanities.org/dhq/vol/7/1/000144/000144.html (accessed 17 March 2014).
- Baldasty, Gerald J., The Press and Politics in the Age of Jackson, Journalism Monographs no. 89, Columbus (U of Ohio P), 1984.
- Cohen, Lara Langer. “Mediums of Exchange: Fanny Fern’s Unoriginality,” ESQ: A Journal of the American Renaissance 55.1 (2009).
- Cragin, Thomas J. “The Failings of Popular News Censorship in Nineteenth-Century France,” Book History 4 (2001).
- Crane, Patricia, “Reading Childishly? A Codicology of the Modern Self,” Comparative Textual Media: Transforming the Humanities in the Postprint Era, ed. N. Katherine Hayles and Jessica Pressman, Minneapolis (U of Minnesota P), 2013.
- Darton, Robert. “What Is the History of Books?” Daedalus 111.3 (1982).
- De Tocqueville, Alexis, Democracy in America, part 2, trans. Henry Reeve, New York (J. & H. G. Langley), 1840.
- Faflik, “Authorship, Ownership, and the Case for Charles Anderson Chester," Book History 11 (2008).
- Garvey, Ellen Gruber, Writing with Scissors: American Scrapbooks from the Civil War to the Harlem Renaissance, Oxford (Oxford UP), 2012.
- Gitelman, Lisa (ed), “Raw Data” is an Oxymoron, Cambridge, MA (MIT P), 2013.
- Hayles, N. Katherine and Jessica Pressman, “Making, Critique: A Media Framework,” Comparative Textual Media: Transforming the Humanities in the Postprint Era, ed. N. Katherine Hayles and Jessica Pressman, Minneapolis (U of Minnesota P), 2013.
- Lee, Maurice S., “Falsifiability, Confirmation Bias, and Textual Promiscuity,” J19 2.1 (2014): 162-171.
- Jackson, Leon, The Business of Letters: Authorial Economies in Antebellum America, Stanford, CA (Stanford UP), 2008.
- Jockers, Matthew L., Macroanalysis: Digital Methods and Literary History, Urbana (U of Illinois P), 2013.
- Loveland, Jeff and Joseph Reagle, "Wikipedia and Encyclopedic Production," New Media & Society 15.8 (2013): 1294-1311.
- McGann, Jerome, "The Rationale of Hypertext," Text 9 (1996): 11-32.
- McGill, Meredith L, American Literature and the Culture of Reprinting, 1834-1853, Philadelphia (U of Pennsylvania P), 2007.
- Moretti, Franco. The Bourgeois: Between History and Literature, Kindle Edition (Verso Books), 2013.
- Mussell, James, The Nineteenth-Century Press in the Digital Age, New York (Palgrave Macmillan), 2012.
- Pease, Donald, "Author," in Critical Terms for Literary Study, ed. Frank Lentricchia and Thomas McLaughlin, Chicago (U of Chicago P), 1995.
- Rice, Grantland S. The Tranformation of Authorship in America, Chicago (U of Chicago P), 1997.
- Wilkens, Matthew, “The Geographic Imagination of Civil War-Era American Fiction,” American Literary History 25.4 (2013).
 This article is indebted to the interdisciplinary team that comprises the larger Viral Texts project, which has been generously supported by the Northeastern University Research Office and the National Endowment for the Humanities. My colleague in Computer Science, David Smith, developed the text-reuse detection algorithm that undergirds the entire effort. I thank him for his generous intellect and keen curiosity about historical and literary research. We have benefited greatly from Elizabeth Maddock Dillon’s wisdom about how computational methods can supplement literary-historical research and, just as importantly, about where computational methods fall short. Finally, the Viral Texts work would be impossible without the substantial contributions of graduate assistants Abby Mullen, Peter Roby, Kevin Smith, and Matthew Williamson, who have done yeoman's work researching and annotating our reprinting data, lending valuable context both to the textual clusters we have identified and to our understanding of the publications from which those reprinted texts were drawn. I want to single out Abby Mullen, who has performed research feats over and again tracking down bibliographic details about individual newspapers and biographical information about their editors. I cannot overstate the rich historical context Abby has brought to this project. In the article I have attempted to signal her direct contributions with (AM).
A newspaper in Houma, Terrebonne Parish, Louisiana, which was primarily an agricultural paper but also reported on politics from a nativist and pro-slavery position. (AM)
In The Transformation of Authorship in America (U of Chicago P 1997) Grantland S. Rice claims "the gradual development of the idea of literary property gave birth to the profession of authorship in America" (79). I do not contest this characterization here, but offer the network-author as a way of describing the larger bulk of writing that proceeded from a very different paradigm, within media even more gradually encompassed by those developing ideas of literary property.
Similarly, David Faflik situates Philadelphia writer George Lippard "within the framework of an antebellum print culture" with extremely flexible "categories of authorship, and thus ownership, and hence attribution," and "in which collaborative composition had become the rule" (149). Within this context, Faflik argues that the pamphlet novel Charles Anderson Chester should not be read as plagiarism of Lippard's 1850 novel, The Killers, but instead as typical, collaborative production. Writing about Charlotte M. Yonge's Aunt Charlotte's Stories of Bible History, Leslee Thorne-Murphy names such practice "reauthorship: a combination of successive individuals writing, editing, and rewriting in a way that shapes anew the image of a single author" (84).
For more on the types and purposes of nineteenth-century reprinting practices, see the introduction to Candy Gunther Brown's The Word in the World (2004); Kyle Robert's "Locating Popular Religion in the Evangelical Tract: The Roots and Routes of the Dairyman's Daughter" Early American Studies (2006); and Bob Nicholson's "'You Kick the Bucket; We Do the Rest!': Jokes and the Culture of Reprinting in the Transatlantic Press" in the Journal of Victorian Culture (2012); and Will Slauter's "Understanding the Lack of Copyright for Journalism in Eighteenth-Century Britain" in Book History (2013). That latter issue of Book History also includes Meredith McGill's useful overview of scholarship on copyright and related matters: "Copyright and Intellectual Property: The State of the Discipline." For more on print culture and networks, see American Periodicals 2013 themed issue (23:2) on “Networks and the Nineteenth-Century Periodical” and Laurel Brake’s “‘Time’s Turbulence’: Mapping Journalism Networks” in Victorian Periodicals Review 44:2 (2011).
Franco Moretti first articulates the idea of "distant reading" in "Conjectures on World Literature," New Left Review 1 (January-February 2000) and outlines a plan for such work in Graphs, Maps, Trees (2005) and Distant Reading (2013). In a recent issue of American Literary History, for instance, Matthew Wilkens demonstrated how computational analysis of geographic terms across the Wright American Fiction collection challenges several scholarly assumptions, from the idea of New England as the central site of mid-nineteenth-century literary attention, to our claims about the emergence of literary regionalism immediately following the Civil War, to our use of the Civil War as an essential periodizing marker. Wilkens concludes with the provocative suggestion that "incongruity of magnitudes [between what seems significant in an individual text and what patterns hold across collections] will be something with which we'll often need to grapple as the objects of our literary- historical analysis move to the corpus level" (830).
In many ways this project takes up a call by Gerald J. Baldasty in 1984 to supplement studies of "individual newspapers and editors" with "aggregate analysis" that spans the national scene of printing in the early nineteenth century (2). This work owes much to David J. Russo on the ways that the nineteenth-century press created local news while aggregating national and international news, Ronald J. Zboray on the complex relationships between technological change and print culture during the antebellum period, Richard D. Brown on the spread of information during the early republic and antebellum periods, David Paul Nord and Candy Gunther Brown on the close ties between evangelical belief and rise of American mass media, and Trish Loughran on the ways print culture both constituted and complicated notions of the nation in nineteenth-century America.
This article was written using an early Viral Texts dataset, which included only reprinted texts from before 1861. The data and algorithmic approach of the Viral Texts project are described in greater detail in the methods supplement to this article, which is available online. The iteration of the Viral Texts data used to write this article is available in a flat CSV file in a GitHub repository, as is the network data used in section 3. A more recently updated (and indeed continually expanding) alpha database of the reprinted clusters we have identified can be found at http://viraltexts.northeastern.edu. This database now includes clusters of reprinted texts from the Chronicling America newspapers up to 1899, as well as from the Library of Congress’ Making of America archive magazines hosted by Cornell University. This online database is still in active development and not yet fully featured; we encourage you to watch the introductory video before browsing the cluster data.
For more on the precise details of our corpus and its limitations, please consult the online methods article.
 "Eloquent Extract" reflects a common naming practice for reprinted newspaper snippets that often makes distinguishing them difficult. Many pieces were titled "An Eloquent Extract" or "A Beautiful Reflection," when the pieces were given titles at all. Parodies of MacKay’s poem (and other poems) are sometimes clustered with the original by our algorithm, particularly if the parody uses enough of the same language as the original. The poem was variously titled "The Resting Place," "Where the Soul Shall Find Rest," and "Wish," and was attributed at least once (in the Nashville Union) to Lady Flora Hastings. I use the construction "at least X times" because the precise size of clusters does shift as we adjust the parameters of our algorithm. We describe this in more detail in the online methods article, but in short when we adjust the parameters to look for shorter segments of aligned text, the algorithm produces larger clusters but these clusters also include more false positive matches. When we look for longer segments the clusters produced are smaller but more reliable.
There are few academic citations for the listicle, but Arika Okrent defined the genre succinctly for The University of Chicago Magazine (January/February 2014): "A listicle is an article in the form of a list." Okrent accounts for the dominance of the listicle online—particularly on sites such as Buzzfeed—and concludes by defining a listicle in the listicle form:
Eight fun facts about the listicle
- A listicle is an article in the form of a list.
- It is kind of like a haiku or a limerick.
- It has comforting structure.
- It makes pieces.
- It puts them in an order.
- Language does that too.
- Sometimes with great difficulty.
- Lists make it look easier.
 That last maxim did not appear in all of the reprintings. In fact, the individual maxims in this listicle were frequently changed—by addition, subtraction, or amendment—as the list was published in different newspapers and magazines. "Maxims to Guide a Young Man" was frequently reprinted under different titles, most commonly "Maxims to Guide a Young Merchant," in publications where it was offered as a guide to not only living well, but to conducting one's business effectively. We have investigated this particular text's print history in other archives, discovering the piece’s long afterlife all the way into the twenty-first century. Remarkably, the most recent reprinting, uncovered by research assistant Peter Roby, comes from a self-published 2007 book, The Driver's Handbook by Arthur E. Hanvold, where it appears under the heading "Personal Principles" and includes modern additions such as "Drink no intoxicating drinks—then drive" alongside distinctively nineteenth-century artifacts such as "Marry only when you are able to support a wife." We discovered this most recent reprinting through Google Books.
June 23, 1859.
"The Huskers" is the most common title assigned this poem in the newspaper reprintings we identified, though it was also called "Song of Labour" and "The Corn Song." Whittier published it as "The Huskers," which includes "The Corn Song" as a sub-poem within it. Both are part of Whittier's longer collection Songs of Labor (1850).
"Starching Linen" was printed in some form in at least 50 newspapers. It was given many titles, including most commonly "Gum Arabic Starch," "A Recipe for Gum Arabic Starch," and "How to Do Up Shirt Bosoms." Though this Nashville Union and American reprinting is the earliest we automatically identified, the Union claims to have reprinted the piece from the Augusta Chronicle, a newspaper not yet available digitally from the Library of Congress. Our uncovered texts quite often point to other reprintings outside of our data set. We see these moments as both a boon and a bane for computational work. While we know that we cannot now and will never automatically capture all reprints—not least because most historical newspapers will always be unavailable in digital form—the reprintings we do capture inevitably offer valuable bibliographic details we would otherwise not have known.
This article was reprinted at least 27 times before 1861 and first appears in our findings in the Middlebury People's Press of October 12, 1841.
This table appears first in our findings in the Mountain Sentinel of October 23, 1851 and was reprinted in at least 33 newspapers.
This squib appears first in our findings in the Rutland County Herald of May 14, 1853 and was reprinted in at least 32 newspapers.
"Ancient Antiquities" was printed at least 26 times before 1861 and appears first in our findings in the Juliet Signal of May 23, 1848.
This particular commendation appears in the introduction to “an improved and extended series of Chambers’s Information for the People” published in Edinburgh, London, and Dublin beginning in January 1841. Similar rhetoric introduces editions of the series published on both sides of the Atlantic through the nineteenth century.
This "Newspapers" snippet was first identified in our study in the Columbia Democrat of December 30, 1837 and reprinted at least 30 times. The Judge mentioned is likely Augustus Longstreet, a Southern lawyer, judge, and minister who also wrote for periodicals and published Georgia Scenes (1835), a book of Southern humor sketches which were themselves originally published in newspapers.
"The Influence of a Newspaper" was reprinted at least 35 times under several headlines. We first identify it in the Vermont Watchman and State Journal of February 5, 1852, which claims to be reprinting it from the Ogdensburgh Sentinel.
"Ladies Should Read the Newspaper" is first identified in our study in the Holmes County Republican of November 11, 1858 and reprinted at least 23 times.
Other pieces about the education of women appear in our most frequently-reprinted clusters. A piece called "Educate Your Daughters" was reprinted in at least 24 newspapers and draws a lesson from the writer's supposed conversations with "the Choctaw Indians." In the snippet the writer urges readers to send their daughters to school, but only so their sons will not "marry uneducated and uncivilized wives" and so their wives will be well prepared to "educate their sons."
Such as "The Farmer's Creed," which was reprinted in at least 30 newspapers and affirmed, "We believe in small farms and thorough cultivation"; in "large crops"; and in "good fences, good barns, good farm houses, good stock, good orchards"; among other things.
Such as a piece often titled "The Clock at Strasburg." This snippet is cited by the Columbia Democrat (March 20, 1847) as originating in "a recent letter to the Liberator" from Henry C. Wright. In it Wright describes both the people in the Strasburg city square and the operations of its elaborate clock on the hour.
A preponderance of our most-widely spread pieces are political, including reprintings of each State of the Union address (or, more accurately, each Presidential Message). I have not yet found the right language for discussing these pieces, which depend less on an editor's discretion than on the national occasion that necessitates their reprinting.
Such as Charles Lamb's "Confessions of a Drunkard," which was reprinted in at least 24 newspapers.
We will soon be able to make such claims with more precision. We are currently annotating our findings with metadata such as author, title, and so forth (none of which are reliably detected by the algorithm, which only identifies matching text between snippets). From this annotation work, however, I am confident that more widely reprinted pieces circulated without an author's name attached than with one.
"Printers' Proverbs" is first identified in our study in the Columbia Democrat of August 26, 1837 and reprinted at least 21 times.
During the Q&A at the "New Media in American Literary History" conference hosted at Northeastern University, December 2013.
This network graph was created in Gephi, an open-source platform for building network visualizations. I used "pairwise" data from the Viral Texts project to generate these graphs: simple tables that show pairs of shared reprints from our total findings thus far.
For a full explanation of how we have grouped newspapers into “families,” see our online methods paper that accompanies this article.
Of Indianapolis, Indiana and Ottawa, Illinois, respectively.
There are important exceptions to this focus on Eastern, urban print centers, such as David J. Russo's The Origins of Local News in the U.S. Country Press, 1840s-1870s (1980).
Importantly, the Chronicling America data includes few newspapers from those print centers. We fully expect that our network model will change dramatically as we incorporate more data from New York and similar print centers, and that the centrality of publications such as these will diminish. Nonetheless, our current models do point to publications outside those print centers worthy of further attention, as these seem to be the publications that propagate information, which often originated along the East Coast, into the South and West.
Indeed, a simple search for the word "network" in Chronicling America's pre-1861 newspapers reveals that the word itself is not markedly anachronistic. Though relatively rare, antebellum newspapers did use "network" in its modern, metaphorical sense, to discuss complex, intertwined systems. For instance, on February 12, 1852 the Mountain Sentinel noted in an article about Austria that the nation's "commerce is hampered by all manner of monopolies, and is involved in such a complex network of restrictions, as on the industrious, gold-getting fingers of a few can unravel," while on May 1, 1860 the Daily Gazette and Comet declared that if readers would "[s]earch deep enough," they would "generally find that the customs of every people are the joint result of many causes acting together—a great network of necessity and compensation."
"A Scissorless Editor" appeared in the Sunbury American and Shamokin Journal, August 21, 1843.
"Scissors," December 29, 1854.
"The Mails &c. &c.," Weekly North Carolina Standard (May 4, 1853) and an unnamed editorial in the Winchester Home Journal (May 6, 1858).
One of the simplest expressions of distant reading is the word cloud: a visualization of word use from a particular corpus—whether a single article or a thousand Victorian novels—in which the most frequently used words are shown larger than less frequently-used words. While there are certainly uses for such blunt analytical instruments, when applied in serious academic arguments they can be underwhelming. The simple fact that a particular word is used often does not necessarily point to its importance to the work. While the patterns uncovered through word clouds and similar text analysis tool can occasionally provoke new or unexpected readings of texts, they are more often useful for shoring up current theories—demonstrating that an idea come to through close reading may apply also at scale.
Accordingly, the past few years have seen a boon in articles and books that leverage "distant reading"/computational approaches toward clear, persuasive, literary-historical interventions. Most notable are Stephen Ramsay's Reading Machines: Toward an Algorithmic Criticism (2011); Lisa M. Rhody's "Topic Modeling and Figurative Language" Journal of Digital Humanities (Winter 2012); the final chapter of Ted Underwood's Why Literary Periods Mattered (2013); and Matthew Jocker's Macroanalysis: Digital Methods and Literary History (2013).