Scale as Deformance

When I was ten years old my parents bought me a microscope set for Christmas. I spent the next weeks eagerly testing everything I could under its lens, beginning with the many samples provided in the box. I could not bring myself to apply the kit’s scalpel to the fully-preserved butterfly—which is intact still in the microscope box in my parents’ attic—but soon I had exhausted all of the pre-made slides: sections of leaves, insect wings, crystalline minerals, scales from fish or lizard skin. The kit also included the supplies to create new slides. I wanted to see blood—my blood. And so with my mom’s help I pricked the tip of my finger with a very thin needle, so I could squeeze a single drop of blood onto the thin glass slide. I remember how it smeared as I applied the plastic coverslip to the top of the slide, and I remember the sense of wonder as I first saw my own blood through the microscope’s lens. Gone was the uniform red liquid, replaced by a bustling ecosystem of red and white cells, walls and enormous spaces where none had been when I was looking with my unaided eye.

Looking at my blood through a microscope, I learned something new and true about it, but that micro view was not more true than familiar macro images. My blood is red and white cells jostling in clear plasma; my blood is also a red liquid that will run in bright-red rivulets from a pin-prick, or clot in dun-red patches over a wound. At micro-scales beyond the power of my children’s microscope, we could focus on the proteins that comprise the membrane of a red blood cell; at even more macro-scales we might consider a blood bank, organizing bags of blood by type for use in emergency rooms.

Grappling with scale is one of the most important and impossible tasks for scholars. What scientists are learning about reality at quantum scales is simply mind-bending, which is the same sensation provoked by trying to really reckon with just how far away are the planets photographed by the Hubble telescope. Those of us working with texts perhaps don’t imagine our subjects as awe-inspiring in the same way as colliding galaxies or spooky action at a distance but, as Michael Whitmore argues:

a text is a text because it is massively addressable at different levels of scale. Addressable here means that one can query a position within the text at a certain level of abstraction…The book or physical instance, then, is one of many levels of address. Backing out into a larger population, we might take a genre of works to be the relevant level of address. Or we could talk about individual lines of print, all the nouns in every line, every third character in every third line. All this variation implies massive flexibility in levels of address. And more provocatively, when we create a digitized population of texts, our modes of address become more and more abstract: all concrete nouns in all the items in the collection, for example, or every item identified as a “History” by Heminges and Condell in the First Folio. Every level is a provisional unity: stable for the purposes of address but also stable because it is the object of address.

Just as atoms can be frozen in place by observation, then, the text can be thought of as “a provisional unity” that has more to do with the questions we wish to ask than to an immutable external reality. Moreover, each act of measurement—each time we freeze the textual system in place in order to make an observation—is an act of deformance. We address this scene, this theme, this argument, this vocabulary, in order to better know this poem, this book, this oeuvre, this corpus. In doing so we learn something true, but we also distort the system, lending outsized importance to our object at the expense of those textual features outside our purview.

Usually we choose a particular textual address in order to better account for something distorted by previous observations, whether that is the representation of a particular racial, cultural, gender, or class group in a set of novels; the influence of a particular genre in a given period; or the presence of particular linguistic patterns across a large corpus. The Viral Texts Project began with an aim to better understand the extent and character of nineteenth-century newspaper reprinting. Using advanced computational methods, we would be able to identify reprints across a large corpus, including those texts that do not explicitly identify themselves as reprints, gaining an interpretive purchase on systemic phenomena that might not even have been completely visible to editors and readers within it. As I have written before, this approach has been wonderfully generative for thinking about popular genres of reading and writing during the period, as well as for helping model the social and technological mechanisms that facilitated textual exchange in the period. As I wrote, “If antebellum circulation was a technology of aggregation and enmeshed social relationships, we can now disambiguate and analyze it—albeit always partially and provisionally—through modern technologies like text mining and visualization.” What I call “disambiguation” here is a particular textual address: we attend not to individual newspapers but to computationally-identified text segments that appear in multiple newspapers. In so doing we can assess trends across newspapers that are not always apparent at the level of an individual issue.

Of course, this disambiguation is itself a distortion of the textual field. When we read texts as “clusters” of reprints in a spreadsheet or database, we do not read them in the contexts of their original publications. Even if one cluster draws our interest and we seek out specific instances of its reprinting to examine more closely, using the page-images of its source newspapers in Chronicling America, our vantage remains from the cluster looking outward to the newspapers. Take, for instance, a favorite cluster of mine, this self-serving exaltation of newspapers as essential educational media. Encountered in a database—in essence, as an enumerative bibliography—we learn much about “Newspapers,” including the numeric, temporal, and geographic extent of its reprinting history, at least so far as we have been able to identify in the Viral Texts Project. From this critical orientation, we might select witnesses to examine more closely, tracing changes in words, lines, or paragraphs, perhaps, or even to studying what kinds of texts were printed around it, but “the text” would remain the cluster, through which or against which these other elements would resonate. Even my choice of this article, usually titled simply “Newspapers,” indicates how its appearance in a database deforms its textual field. I know to be interested in this text because it was widely reprinted: because it’s a very big cluster in the database, which is where I first encountered it. In the context of any individual newspaper this cluster might not measure highly, but when measured in a group with many other witnesses it begs us to ask: why this text? What does the prevalence of “Newspapers” mean for our understanding of nineteenth-century newspapers, editors, readers, circulation, and so forth? But we should not confuse any truths we glean from consideration of this textual cluster with the final word on this text. On the page of a specific issue “Newspapers” might indeed be less important, relegated to a tiny corner on the last page, clearly added as filler; while on another page of another issue of another paper it might sit in pride of place on page one. The database/cluster view offers us a set of truths about this text, but not its only or final truths.

So: criticism deforms; any particular textual address distorts the textual field entire in service of illuminating a previously-obscured corner. We might posit the notion of scale itself as deformance, both distorting and generative. There have been many advocates in the digital humanities for methods and projects that shuttle between scales, what Martin Mueller names “scalable reading”: “Digital tools and methods certainly let you zoom out, but they also let you zoom in, and their most distinctive power resides precisely in the ease with which you can change your perspective from a bird’s eye view to close-up analysis. Often it is a detail seen from afar that motivates a closer look.” While many people agree that such movement between scales would be A Very Good Thing, I would argue we largely have not figured out how to do it effectively.

In many discussions of reading scales, I would identify a notion of complementarity or scalability across the body of scholarship rather than within a particular article or project. In Macroanalysis, for instance, Matthew Jockers argues that in literary studies “two scales of analysis”—macro- and microanalysis—“should and need to coexist.” As you would expect from his book's title, Jockers models macroanalysis, testing at the corpus scale ideas about literature drawn from smaller-scale studies, as well as developing new theories suggested by computational text analysis. But movement between scales is largely not a recursive process in Macroanalysis. Jockers doesn’t, in that book at least, model how his computational findings might restructure a close reading of one novel in his corpus, and how that reading might in turn hone new questions best answered at the corpus scale. At the risk of painting with too broad a brush, the idea of complementarity in Macroanalysis—and in many other works of computational text analysis—itself operates at a macro scale, assuming—rightly, I hope—that the arguments and methods of this book make one critical intervention into a larger community, challenging some of the conclusions drawn by previous observations and open to being challenged by future work.

This approach seems to me perfectly consistent and likely well advised, for the most part, as it allows scholars to hone their methodologies and produce more rigorous scholarship, rather than trying to be all things to all peers. Jockers’ conversation with Julia Flanders, “A Matter of Scale,” is one of the most thoughtful and persuasive attempts to reconcile the gap between large-scale corpus analysis and the intricate editorial scholarship of TEI encoding. While they make a persuasive theoretical case for computational tools which make “it possible to see both scale and detail simultaneously,” there remain few attempts to do so in practice. I find especially valuable those pieces that attempt to model movement between scales within a single project or analysis, such as Lauren Klein’s recent article, “The Image of Absence: Archival Silence, Data Visualization, and James Hemings”, which both applies close literary-historical research toward its model of digital textual data and applies the findings of computational data analysis back to its literary-historical analysis. It is projects like this one, that can construct meaningful conversations across scales, which are most likely to speak not only to other digital humanists, but also to their disciplinary fields—and perhaps to provide a bridge for more humanities scholarship that critically engages macro-scale research.

Which brings me to our most recent experiment in the Viral Texts Project. “A ‘Stunning’ Love Letter to Viral Texts” is an exhibit, built in Neatline, that reambiguates our cluster data, annotating a single page of The Raftsman’s Journal from November 4, 1868 so that each cluster points to the other witnesses we have identified of it in the Viral Texts database. As Jonathan Fitzgerald points out in his post about the exhibit, “we had to literally draw boxes around each article and then delve into our data to annotate each item.” Fitzgerald aligns the exhibit with Bethany Nowviskie’s advocacy of Neatline as a platform for visualization that is “something created minutely, manually, and iteratively, to draw our attention to small things and unfold it there.” As Fitzgerald points out, working minutely, with a single newspaper page, was an iterative process, and it brought out features of nineteenth-century reprinting that are not readily apparent at the scale of our database: “While we wanted to showcase the diversity of the genres available in our data, we didn’t expect to find that nearly every item on the page appeared at least once elsewhere in the corpus. Most of the time when working with our data, we’re looking at spreadsheets or running queries into large data frames, but working at the level of the individual page and tracing the connections out from there allowed us to see a quality of the data that the spreadsheets do not reveal.”

In annotating one page of one newspaper, we addressed our texts from an inverted perspective of the database view, finding something that was not apparent in the spreadsheet view of reprinted clusters: namely, the sheer amount of reprinted content on a single newspaper sheet. We also attended to several reprinted texts that would not have grabbed our attention in the database, because they were only reprinted a few times. That they were not as widely reprinted does not, of course, mean they might not have something to teach us about the nineteenth-century newspaper; it means only that the spreadsheet/database address privileges larger textual clusters, which literally appear at the top of our data output.

But this is not a post about how we rediscovered close reading. Instead, I want in my final paragraphs to think through Nowviskie’s larger description of Neatline’s theory of visualization:

Neatline sees visualization itself as part of the interpretive process of humanities scholarship—not as an algorithmically-generated, push-button result or a macro-view for distant reading—but as something created minutely, manually, and iteratively, to draw our attention to small things and unfold it there. Neatline sees humanities visualization not as a result but as a process: as an interpretive act that will itself—inevitably—be changed by its own particular and unique course of creation.

This description mostly aligns with with our goals for this exhibit, but not precisely. This exhibit was inspired, after all, by a textual artifact uncovered through algorithmic investigation—“a detail seen from afar”?—that drew us to this particular page of this particular historical newspaper. What is more, our “minutely, manually” created Neatline records point back into our reprints database, providing a narrative for the results of a “distant reading” project through the “small thing” of an individual historical newspaper. To say this another way, this exhibit puts two scales of address—two deformances—of the Chronicling America newspaper archive—itself a deformance of the larger print archive of historical newspapers—into direct and sometimes uneasy conversation. My feeling when browsing the exhibit, which I hope others share, is an uneasiness of vacillating between vantage points, of moving rapidly between the microscope and the telescope. Both of these vantages are true, and teach us something about the object of inquiry, but zooming between them can discombobulate.

Ideally, this kind of vacillation can be an iterative and recursive process. This exhibit essentially inverts the way we have looked at project results in spreadsheets and our database, taking as its primary orientation the single newspaper page and linking from there out to a more dispersed, networked textual scene of reprinting. This leads us to a new question we might pursue at the macro scale. Could we could construct an alternative view of our data, at scale, that maps our disambiguated texts back onto their source newspapers, contextualizing them not with other reprints of the same text, but with the other reprints that appeared around a particular witness? My collaborator David Smith has asked, “What news is new?” to describe how we might frame this question at the corpus scale, measuring how much of particular newspapers seems to be unique to those papers each day, each month, each year, and so forth.

I don't want to make too much of this single exhibit. As Fitzgerald points out, we also built it because it was a fun way to explore some aspects of the larger project and introduce it to new readers. But even that fun is important, I think, pointing to the value of switching scales within a "big data" project to see it slant, to read it backward. In this case, our movement between scales—from clusters of reprints drawn from millions of newspaper pages to the specific reprints on an individual newspaper page—is a recursive and iterative process—work at the corpus scale suggests details worth attending to in specific newspaper issues, and time spent with those issues suggests new computational questions that could be tested across the corpus. At each stage, our observations freeze “the text” at one scale and in one form, hiding some of its attributes in order to make others more apparent. These deformances are constitutive, highlighting the very gaps that might be better understood through iteration. Like my views of a blood drop as a child, neither scale is better or more truthful: both are revelatory, even sometimes awe inspiring.