The following talk was delivered to the 1st annual meeting of the Andrew W. Mellon Society of Fellows in Critical Bibliography in Charlottesville, Virginia, on May 24, 2018.
I want to begin with this self-serving, poetical(ish) selection, printed in more than two hundred newspapers during the nineteenth century.
Take the Paper
We find the following going the rounds of the press. Read, ponder, and—PAY UP! Why don’t you take the papers? they’re the life of my delight, except about election time, and then I read for spite. Subscribe, you cannot loose a cent, why should you be afraid? for cash thus spent is money lent at interest, four-fold paid. Go, then, and take the papers, and pay to-day, nor pay delay, and my word it is inferred, you’ll live until you’re gray. An old neighbor of mine, while dying of a cough, desired to hear the latest news while he was going off. I took the paper and I read of some new pills in force; he bought a box—and is he dead? no—hearty as a horse. I knew two men as much alike as e’er you saw two stumps; and no phrenologist could find a difference in their bumps. One takes the paper and his life is happier than a king’s, his children can all read and write, and talk of men and things. The other took no paper, and, while strolling through the wood, a tree fell down and broke his crown, and killed him—”very good.” Had he been reading all the news, at home like his neighbor Jim, I’ll be a cent that accident would not have happened him, for he who takes the paper, and pays his bill when due, can live in peace with every man, and with the printer too.
It’s a clever (enough) nugget of self-promotion, and general enough that editors could easily excerpt and reprint it as the text circulated. This is one text identifed through the Viral Texts project at Northeastern University, which seeks to better understand how texts circulated through nineteenth-century newspapers. In a period when newspaper content was not protected by intellectual property law, editors swapped papers through “the exchange systems,” assisted by U.S. postal laws, so that most papers worked more as aggregators than springs of original writing. Many nineteenth-century editors saw their primary role as “selection” rather than “editorializing,” and their primary media as scissors and paste. Because they subscribed to many newspapers, editors had a wider view of the exchange system ecosystem, and they often wrote of widely-reprinted selections “going the rounds of the press.”
Sometimes newspapers used this phrase to express skepticism over the information circulating among their peers, as when the Arizona Sentinel (17 February 1877) wrote coyly of a reported small-pox cure, “the following is going the rounds of the press and we give it for what it is worth.” Most often, however, “going the rounds” signaled only circulation, subtly reminding readers of the systems of exchange, selection, and adaptation, and reprinting that brought items into their local papers, as when the Ohio Organ, of the Temperance Reform (22 July 1853) recounted a letter, written by Daniel Webster, which was “going the rounds of the temperance press” and in which the “distinguished statesman” offered “a fine specimen of non-committalism (sic)”.
The selections that circulated through the nineteenth century exchanges network pose intellectual challenges for bibliographers and literary historians, asking us to think collectively about texts, genres, and publication, to imagine linked sets of rapidly mutating texts that stretch across geographic and temporal space. The problem of nineteenth-century newspaper circulation can also offer interpretive purchase on the questions raised by our new digital archives and the computational models through which we increasingly identify, ennumerate, and analyze texts across large fields. Computation can reorganize the nineteenth-century newspaper archive in useful and sometimes enlightening ways, but we require more robust frameworks to account for the models under which our research proceeds. Because bibliographic methods attend closely to textual similarity and difference, bibliography has much to contribute here, particularly when its methods dovetail with experimental computational methods.
Earlier in the digital age—indeed, well before what most of us consider “the digital age”—bibliographer Thomas Tanselle worked to expand the intellectual frames of bibliography for a new publication environment in which texts could be typed in a word processer and proliferate almost instantly and almost infinitely. Tanselle proposed that the definition of edition be expanded to include “all copies resulting from a single job of typographical composition.” Tanselle recognized that keying letters at a computer was an “act of assembling…letterforms” akin to composing in cold or hot type.
In a recent article in Book History, I argue that humanities scholars need to take optical character recognition (OCR) seriously as a material and cultural artifact—to apprehend the OCR layers which are typically covered by the interfaces of mass digital archives and grapple with the technical, social, and political structures through which such resources are created. OCR software scans digitized page images, attempting to recognize the letterforms on the images and transcribe them into a text file. I argue that we might consider OCR a species of compositor, setting type in a language it can see but not comprehend, and thus that OCR data derived from a historical text is a new edition of the text: a copy “resulting from a single job of typographical composition” by OCR software.
Work like ours on Viral Texts forces another reconsideration that asks, from the standpoint of textual criticism and bibliography, how we might understand texts linked or collated by inference and/or probability. I would propose the speculative edition: all texts associated through a single computational model. In the Viral Texts project, our algorithm groups regions of textual data containing
- a given number of matching phrases
- of a particular word length (i.e. n-grams)
- that appear within a passage of a given length
I’m simplifying things too much (and there’s much more detail in our ALH methods paper), but those are the three major variables: how many words long should the matching phrases be? How many such phrases need to match? and how close together on the page must those matching phrases be? Our computational model of nineteenth-century reprinting, then, represents a “reprint” as bundles of matching phrases that occur in relatively close proximity to each other across multiple documents. Adjusting these variables results in more restrictive or capacious models of reprinting. Understanding how well any specific implementation of this model—any specific designation of the algorithm’s variables—does in fact represent the practice of nineteenth-century newspaper reprinting constitutes a substantial part of the intellectual labor that goes into my collaboration with David Smith, my principle computer science collaborator.
The clusters of matching texts output by Viral Texts’ algorithms are, in a sense, enumerative bibliographies: lists of textual witnesses and the metadata (e.g. date, title) of the newspapers that reprinted them. But not quite. Really, each clusters lists text segments similar enough under our algorithm’s constraints, which shift as we experiment with parameters described above. The resulting clusters are generally reliable as witnesses, but there are inevitably false positives and false negatives. Moreover, our clusters include species of repetition beyond witnesses, such as extended quotations of one text within another, or partial excerpts of longer texts embedded within other texts.
As an example of the uncertain boundaries these methods highlight, we might consider “Twelve Ways of Committing Suicide.” This text is what we might call a “listicle,” an article formatted as a list, often with its items numbered, and one which strongly resembles twenty-first-century listicles on websites such as Buzzfeed. The title is humorous, as the list enumerates a series of habits that “produce more sickness, suffering, and death, than all epidemics, malaria and contagion, combined with war, pestilence and famine.” These causes of “indirect suicide,” as some reprints styled them, include habits:
- sartorial: “1. Wearing of thin shoes and cotton stockings on damp nights, and in cool rainy weather.”
- intellectual: “2. Leading a life of enfeebling, stupid laziness, and keeping the mind in an unnatural state of excitement by reading trashy novels.”
- gastronomical: “4. Surfeiting on hot and very stimulating dinners. Eating in a hurry, without half masticating your food, and eating heartily before going to bed every night…”
- matrimonial: “6. Marrying in haste and getting an uncongenial companion, and living the remainder of life in mental dissatisfaction.”
- and psychological: “10. Contriving to keep in a continual worry about something or nothing.”
“Twelve Ways of Committing Suicide” was reprinted at least 170 times in newspapers in the US, UK, and Australia. As it circulated, however, its title changed frequently, as did many of the items in the list. Even naming the text “Twelve Ways of Committing Suicide” solifies a more fluid textual reality, as only a few of the extant versions we have identified used this title, and many included more or fewer items than twelve.
We might also consider “What I Live For,” an exhortative poem that saw some success circulating through the exchange system as a full, five-stanza poem, though its cited author changed frequently. The poem’s first stanza, however, circulated far more widely than the whole, as a fragment interspersed with obituaries, speeches, and even posited as the epigraph to “an appropriate pledge prepared and proposed for every Southern Volunteer” in the opening months of the American Civil War. The poem “What I Live For”, then, refracts through many other texts, a field of reprinting, quotation, and appropriation, linked by a computational model of textual similarity.
“Speculative bibliography” might provide an interpretive frame for understanding objects like these, created through computational, statistical, and/or probabilistic methods such as ours in Viral Texts. By framing these textual collections as speculative editions, we avoid myths of ideal texts or digital surrogacy. Our speculative editions manifest in a computational model a theory of the nineteenth-century newspaper rooted not in authors or texts, but instead in circulation and multiplicity. Other speculative editions could forefront different patterns within the archive, such as texts drawing heavily on similar topics (from a topic model) or texts closely aligned through word embeddings.
Bibliographers have proposed other frames for understanding the ways texts mutate as they circulate: a “sociology of texts,” to quote Don MacKenzie, from which we can compile “fluid” or “social” editions. Our research posits social editions based on textual similarity, but these editions shift as we adjust the assumptions of our model. Our editions are speculative because they are algorithmic and exploratory: each clustering operationalizes “reprinting,” but we understand how texts “went the rounds” not through one clustering but through our iterative engagement with multiple models over time.
My proposal of “speculative bibliography” is largely inspired by the work of Bethany Nowviskie, who began thinking about “speculative computing” in her 2004 dissertation project, and has more recently discussed “speculative knowledge design” for libraries. Nowviskie was inspired by ideas in computer science around “speculative execution,” in which certain computations are performed before it is known they will be needed in a given process, in order to prevent delays if those computations are indeed needed. Quoting computer scientist Randy Osborne, who notes that “A speculative computation will eventually become mandatory or irrelevant” Nowviskie advocates for the creation of digital tools that enable exploration, creation, and aesthetic provocation. Nowviskie is drawn to the idea that “preoccupations of projection, without regard for the cost of relevancy, could become active: embodied in real work, real artifacts, real happenings and doings, in a digital environment.”
Our models in Viral Texts are speculative: projections that posit alternative shapes for the digital newspaper archive. Certainly our pattern-driven approach produces more textual groups—more relationships—than we can practically explore. A more commmon technique, keyword search, often does the same. In both cases, the computational model produces both mandatory and irrelevant groupings (and perhaps even some irreverent ones). It’s a omnibus approach to the problem, but I would argue this speculative approach well suits the scale, hybridity, and complexity of the nineteenth-century newspaper. I want to call this work speculative bibliography for two reasons. For one, as scholars such as Matthew Kirschenbaum, Lisa Gitelman, and Alan Galey have well demonstrated, data is itself a material artifact, however abstruse its materiality might seem when compared to artifacts such as books. Digitized historical newspapers don’t float in the ether, but are inscribed on hard drives and servers, as are the models of that data generated through computational processes. Second, I would argue quite simply that computational models enact theories of textual relationship that should be in conversation with theories developed in the bibliographic tradition and vanguard alike.
Each time we revise our reprint-detection algorithm in Viral Texts, we stir the archive anew—or, perhaps, we cut, paste, and reorder archival materials (which were, in the case of our newspapers, themselves cut, pasted, and reordered materials from other media). Some familiar connections are reformed. Presumably, if we’re doing things right, a text that we found reprinted hundreds of times in a previous run will exist in new iterations, though somewhat changed as, for instance, new witnesses are identified and added to the set. Such an iterative research environment is dynamic and ever evocative—and the norm in fields such as computer science—but historical inquiry demands also ways to establish stable points of reference. Ultimately, to solve this problem we have turned to one of the bibliographer’s most longstanding techniques: diplomatic transcription. As we write a Viral Texts book, we transcribe each text—or better, one witness from each textual group—that we reference in our writing.
Importantly, these transcriptions are not intended to freeze the text: to identify a particular witness as the correct one, or to solidify its temporal or textual boundaries. Instead, the transcriptions add a supervised “seed” corpus to our largely unsupervised computational methods—a new loop in the process—that ensures particular textual groups will exist in each iteration and, practically, that we will be able to find them. Essentially, however, this loop ensures that future users and readers who come to Viral Texts will not find precisely the textual clusters created during an analysis in 2017 or 2018 and henceforward locked, but instead will find dynamic, evolving bibliographies related to those we studied and wrote about, but (hopefully) not identical to them. So, rather than finding the 65 reprints of the poem “Beautiful Snow” we identified in pre-Civil War U.S. newspapers during our initial experiments, they would instead find a much-expanded textual field, which at the moment comprises 308 reprints in US, UK, and Australian newspapers between the 1850s and 1890s. As one component of a computational process, the transcription becomes a seed for a speculative edition rather than a singular, static text. I would argue that a wide range of speculative editions will be increasingly needed if we are to make sense of the mass digitized historical record. In an era of information overload, we need new ways beyond keyword search for sorting, reorganizing, and juxtaposing historical records.
In closing I want to point to the work of design scholars such as Mitchell Whitelaw (also a touchstone for Nowviskie in her more recent writing), who advocates for “generous interfaces” that “invite exploration and support browsing” rather than shrinking vast digital resources into a search bar. Whitelaw calls for “multiple, fragmentary representations to reveal the complexity and diversity of cultural collections, and to privilege the process of interpretation.” Whitelaw highlights generous interfaces he helped design that allow users, for example, to browse a collection of Australian periodicals through a mosaic view that can be dynamically shifted between chronological presentation and other algorithmically-generated views, such as one that organizes the collection based on the similarity of color palettes in the magazines’ front pages. Importantly to me, what Whitelaw describes as “generous interfaces” forefront the digitality of digital collections, rather than defaulting to (what I would argue are) failed metaphors of digital-material surrogacy. Generous interfaces do not pretend to offer up unmediated artifacts, but instead explore the digital medium’s capacities to reorganize, layer, or refract materials, sometimes in ways impossible in those materials’ source media.
For bibliographers, work like Whitelaw’s (and others working in experimental interfaces) suggests, essentially, that at least some of the failures in digital interfaces are due not to intrinsic qualities of the medium, but in fact due to failures of design thinking—or, said another way, to our failure to think fully or imaginatively about the remediation of historical materials. I use the word “imagination” deliberately in closing, because I want to push us to think beyond the humanities-technology dichotomy we have accepted entirely too uncritically. My work as a book historian convinces me these boundaries are not only strategically worrisome in the twenty-first century, but also rest on historically shaky ground. The work of the humanities is the understanding and interpretation of culture. We cannot abscond from our duty out of some false sense that the new objects pervading our lives are uniquely inhuman or impenetrable. The vast nineteenth-century periodicals archive brings into stark relief the challenges of scale, significance, and missingness that historical archives always pose: there is always “too much to know.”” These challenges are, though named differently, also central to much research in computer science, particularly in the subdomains of information retrieval and machine learning. There is much potential for dialogue among our fields, but it is too often stymied by a lack of curiosity on both sides. Speculative bibliography perhaps offers a space for mutual benefit: for experimental, combinatorial, even ludic engagements with our remediated past.