What Makes Computational Evidence Significant for Literary-Historical Argument?

I’ve been invited to submit an initial position paper for the Arguing with Digital History workshop, to be held at The Roy Rosenzweig Center for History and New Media in September 2017. Following Michelle Moravec’s lead, I’d like to offer my response publicly, most especially because of the ways these thoughts were written alongside and intertwine with Zoe LeBlanc’s provocative questions on Twitter and the valuable thread of responses to them from the community. We have been asked to come to the workshop prepared to modify our views as a result of the discussion, so I’d like to be clear that these are initial and deliberately provocative thoughts which are very open to amendment or even wholesale rethinking. Also, we were asked to keep these position papers to two pages, which I’m already a bit over, so I apologize if some lines of thought feel truncated. There is so much more to say about…well…about all of this.

Argumentation for digital history stumbles over the ontology of its evidence. I’m writing here about corpus-scale analysis, the digital methodology I know best from my work on the Viral Texts project, and variously named by terms like “distant reading” or “cultural analytics.” Though the specifics of these methods are hotly debated, we might gather them under the sign of scale, a notion of “reading”—and I’d like to make that word do more work than perhaps it should—across wider sets of materials than was typical for humanists prior to computational methods.

Recently in literary-historical circles, Katherine Bode has inspired a much-needed discussion about the corpora on which computational analyses are based. Drawing on traditions of book history and textual scholarship, Bode critiques Moretti and Jockers, in particular, as metonymies for distant reading approaches:

Moretti and Jockers construct literary systems as comprised of singular and stable entities, while also imagining that they capture the complexity of such systems in the process. In fact, because their datasets miss most historical relationships between literary works, their analyses are forced to rely upon basic features of new literary production to constitute both the literary phenomenon requiring explanation, and the explanation for it.

Most incisively, Bode shows how much “distant reading” work reconstitutes the primary assumption of close reading: “the dematerialized and depopulated understanding of literature in Jockers’s work enacts the New Criticism’s neglect of context, in a manner rendered only more abstract by the large number of ‘texts’ under consideration.” The problem may be, in other words, not that computational analysis departs from analog methods, but that we interpret the results of corpus-level analysis too much like we interpret individual texts. To be provocative, I might rephrase to say that we don’t yet as a field understand precisely how corpus-scale phenomena make their meaning, or how those meanings relate back to codex-scale artifacts.

I can pick on myself to clarify what I mean (and here I’m paraphrasing some points I make in “Scale as Deformance”). In the Viral Texts project, we have developed sophisticated methods for identifying reprinted pieces across nineteenth-century newspaper corpora. When we find, say, a popular science article that was reprinted 150 times around the world, that cluster of texts can help us think about circulation, genre, and networks of influence among editors in the period. When compared with other texts circulating around the same time, it can teach us something about the concerns, interests, and priorities of readers and editors as well. But a textual cluster is not singular—it is in fact defined by its multiplicity—and the meaning of its reprinting does not evenly distribute across the 150 individual witnesses that make up the cluster. Some of the nineteenth century editors who reprinted a given piece, and some of the nineteenth century readers who read it, would have known it was “making the rounds,” and may have had a sense of its wide reach. However, no nineteenth-century person had the corpus-scale perspective on a given cluster that we do from the wild surmise of a CSV file. An article embedded in a complex textual system signifies in both networked and highly local ways, but we cannot easily extrapolate from the meanings we assign a cluster (among many other clusters) to the meanings of its constituent texts, much less the readers of those texts.

There has been much written (including by me!) about the need for zoomable, scalable, or macroscopic reading that puts insights drawn from distinct scales in conversation. However, I would argue that thus far digital (literary) history has not adequately theorized the middle ground between corpus and codex, or developed methods that can meaningfully relate corpus-scale patterns to individual texts without pretending that patterns at each scale can be understood under the same interpretive paradigm. I would go so far as suggesting the macroscope is not the most useful metaphor for structuring digital historical arguments, as it implies a seamless movement between scales that the realities of analysis belie. Perhaps new metaphors are needed for expressing the continuities and disjunctures between analyses at distinct scales.

Why do scholarly metaphors matter to argument in digital history? We have been so insistent on seamless movement between scales—and so resistant to appearing like positivists or cliometricians—that we have failed to develop field-specific paradigms for interpreting the results of corpus-scale text analyses. What standards we have are imported from other fields such as corpus linguistics, but as such they must be rearticulated and renegotiated for every article, website, or book we publish. More importantly, as Scott Weingart has shown, “methodology appropriation is dangerous” and, frankly, our colleagues are right to look with skepticism on methods imported wholesale from other disciplines. Ted Underwood’s recent “A Genealogy of Distant Reading” offers important context here, noting that, “linguistics may be looming a little too large in the foreground of contemporary narratives about distant reading, so much that it blocks our view of other things,” including forebears in humanities fields prior to computation. We needn’t impugn the practices of disciplines from which we could indeed learn much, but we should insist that imported methodologies be understood, examined, and reimagined to meet the specific needs of literary or historical research.

To cite a specific example, computational historical arguments require models for effective sampling, which might help clarify how analyses at distinct scales relate to one another. To put it bluntly, we have no idea what an effective sample from a literary or historical corpus should look like. What random sample of novels (or newspaper pages, or museum artifact descriptions) could I topic model from a given corpus with some confidence it can represent the larger collection? As humanists we are well prepared to nuance notions of “representativeness,” but those necessary caveats cannot leave us with the answer that sampling must be reinvented anew for every corpus and every study, which would indeed leave us explicating data in much the same way Cleanth Brooks explicated “Ode on a Grecian Urn.” We also cannot default to the answer that humanists can only sample in the same way that sociologists or political scientists or linguists do. My point is: we lack even rough guidelines around which to debate, but we could have those conversations.

I will end with a too-brief reflection on significance: a word with quite specific meanings in quantitative fields that we cannot port entire into literature or history. In the Viral Texts project, there are certain features of nineteenth-century newspapers we can only study—at least as of yet—through their presence, which makes their statistical significance difficult to gauge. When I write, for instance, that “information literature” is an interesting feature of widely-reprinted newspaper texts in the nineteenth century, my standard of significance comes from codex-scale work. I have read a lot of nineteenth century newspapers and so understand these genres in the context of their medium. From that starting point, information literature seems more common in those pieces we identify as widely reprinted than I would expect. But I cannot estimate the presence of “information literature” in articles that were not reprinted, while the fragmentary coverage of our corpora—to return to Bode—ensures that many reprinted pieces are not identified as such, as their source newspapers are either not digitized or are included in other corpora to which we do not have access.

While I mostly agree with Weingart’s more recent claims that “[c]omputational history has gotten away with a lot of context-free presentations of fact,” I would insist that comparative statistics are not the only—or often the most compelling—method for building such context. When I write about “information literature” as significant don’t mean that it appears more often than it would in some theoretical null corpus. I am not talking about a p-value. As Weingart mentions, however, we might look also toward other kinds of “deviations from expectations,” including expectations set by previous field literature. I note the prevalence of reprinted information literature as conspicuous given the dearth of critical attention paid to information literature in prior literary-historical criticism. Very few scholars have attended seriously to short, popular science; trivia; recipes; household tips; listicles; and related genres despite the fact that they filled newspapers and circulated around the globe. There might be a reason to work toward measuring the statistical significance of information literature. We could train a classifier using our extracted information literature, for instance, and then attempt to discern how many non-reprinted newspaper texts are information literature. From there we could compare the proportion of the genres in reprints to their proportion in the larger corpus. But if our goal is to make arguments that will impact literary or historical criticism, it is far more essential that the patterns we trace computationally speak to significant questions or gaps of attention in our disciplines. There is nothing wrong with using statistical measures as evidence, but such measures cannot be the extent of our accounts.

For corpus-scale analyses to resonate with humanities scholars, we must be “more ambitious,” as Miriam Posner has urged, in “rebuilding the machinery of the archive and database so that it doesn’t reproduce the logic” of exclusion and marginalization embedded into computational tools. Posner worries that digital humanists “seem happy to flatten the world into known data structures” to which I would add that we seem likewise happy to flatten our data mining to methods and measures established in other disciplines. Part of rebuilding the machinery requires us to articulate discipline-specific models for relating text and corpus without collapsing them into each other. I am drawn again and again to Lauren Klein’s description of topic modeling as “a technique that stirs the archive,” and such stirring remains to my mind the most compelling use for computational analyses in literary-historical corpora. But we need a better vocabulary for describing the composition of our archives, the outcomes of our archival remixing, and the interpretive space in between.