‘Q i-jtb the Raven’: Taking Dirty OCR Seriously

The following is a talk I will deliver on January 9, 2016 for the Bibliography and Scholarly Editing Forum’s panel at MLA 2016. It is part of a longer article in progress.

On November 28, 1849 the Lewisburg Chronicle, and the West Branch Farmer published one of the most popular poems of the nineteenth century, Edgar Allan Poe’s “The Raven.”

The November 28, 1849 Lewisburg Chronicle, and the West Branch Farmer
The November 28, 1849 Lewisburg Chronicle, and the West Branch Farmer

The Lewisburg Chronicle’s “Raven” is one version among many printed after Poe’s death in 1849—“By Edgar A. Poe, dec’d”—interesting as a small signal of the poem’s circulation and reception. It is just such reprinting that we are tracing in the Viral Texts project, in which we use computational methods to automatically surface patterns of reprinting across nineteenth-century newspaper archives.

And so this version of the poem also becomes interesting as a digitized object in the twenty-first century, in which at least one iteration of the poem’s famous refrain is rendered by optical character recognition as, “Q i-jtb the Raven, ‘Nevermore’” (OCR is a term for computer programs that identify machine-readable words from a scanned page image, and is the source for most of the searchable data in large-scale digital archives). What is this text—this digital artifact I access in 2016? Where did it come from, and how did it come to be?
Scale as Deformance

When I was ten years old my parents bought me a microscope set for Christmas. I spent the next weeks eagerly testing everything I could under its lens, beginning with the many samples provided in the box. I could not bring myself to apply the kit’s scalpel to the fully-preserved butterfly—which is intact still in the microscope box in my parents’ attic—but soon I had exhausted all of the pre-made slides: sections of leaves, insect wings, crystalline minerals, scales from fish or lizard skin. The kit also included the supplies to create new slides. I wanted to see blood—my blood. And so with my mom’s help I pricked the tip of my finger with a very thin needle, so I could squeeze a single drop of blood onto the thin glass slide. I remember how it smeared as I applied the plastic coverslip to the top of the slide, and I remember the sense of wonder as I first saw my own blood through the microscope’s lens. Gone was the uniform red liquid, replaced by a bustling ecosystem of red and white cells, walls and enormous spaces where none had been when I was looking with my unaided eye.

Looking at my blood through a microscope, I learned something new and true about it, but that micro view was not more true than familiar macro images. My blood is red and white cells jostling in clear plasma; my blood is also a red liquid that will run in bright-red rivulets from a pin-prick, or clot in dun-red patches over a wound. At micro-scales beyond the power of my children’s microscope, we could focus on the proteins that comprise the membrane of a red blood cell; at even more macro-scales we might consider a blood bank, organizing bags of blood by type for use in emergency rooms.

Reprinting, Circulation, and the Network Author in Antebellum Newspapers

This is a pre-print version of this article. The final, edited version appears in American Literary History 27.3 (August 2015). An accompanying methods paper co-written by me, David Smith, and Abby Mullen can be found on the Viral Texts Project site.

I. Introduction[1]

When Louis F. Anderson took over the editorship of the Houma Ceres in 1856, he admitted that he was "not…very distinguished as a 'knight of the gray goose quill,'" but assured his new readers that "our pen will not lead us into difficulty" because "our 'principal assistant,' the scissors, will be called into frequent requisition—believing as we do, that a good selection is always preferable to a bad editorial" (June 28, 1856).[2] Thus, Anderson sums up a set of attitudes toward the production, authorship, and circulation of newspaper content within a system founded on textual borrowing. In the antebellum US context, circulation often substituted for authorship; the authority of the newspaper rested on networks of information exchange that underlay its production. "Nothing but a newspaper can drop the same thought into a thousand minds at the same moment," Alexis de Tocqueville writes, describing circulation as a technology—like the rail and telegraph—compressing space and time, linking individuals around the nation by "talk[ing] to you briefly every day of the common weal" (111). In both examples, the newspaper's primary value stems from whom and how it connects.

How Not to Teach Digital Humanities

The following is a talk I’ve revised over the past few years. It began with a post on “curricular incursion”, the ideas of which developed through a talk at DH2013 and two invited talks, one at the University of Michigan’s Institute for the Humanities in March 2014 and another at the Freedman Center for Digital Scholarship’s “Pedagogy and Practices” Colloquium at Case Western Reserve University in November 2014. I’ve embedded a video from the latter presentation at the bottom of the article. There is a more polished version of the article available in Debates in the Digital Humanities 2016.

“À l’ École,” Villemard (1910)

In late summer of 2010, I arrived on the campus of St. Norbert College in De Pere, Wisconsin. I was a newly-minted assistant professor, brimming with optimism, and the field with which I increasingly identified my work—this “digital humanities”—had just been declared “the first ‘next big thing’ in a long time” by William Pannapacker in his Chronicle of Higher Education column. “We are now realizing,” Pannapacker had written of the professors gathered at the Modern Language Association’s annual convention, “that resistance is futile.” So of course I immediately proposed a new “Introduction to Digital Humanities” course for upper-level undergraduates at St. Norbert. My syllabus was, perhaps, hastily constructed—patched together from “Intro to DH” syllabi in a Zotero group—but surely it would pass muster. They had hired me, after all; surely they were keen to see digital humanities in the curriculum. In any case, how could the curricular committee reject “the next big thing?” particularly when resistance was futile?

But reject it they did. They wrote back with concerns about the “student constituency” for the course, its overall theme, my expected learning outcomes, the projected enrollment, the course texts, and the balance between theoretical and practical instruction in the day-to-day operations of the class.

  1. What would be the student constituency for this course? It looks like it will be somewhat specialized and the several topics seems to suggest graduate student level work. Perhaps you could spell out the learning objectives and say more about the targeted students. There is a concern about the course having sufficient enrollment.
  2. The course itself could be fleshed out more. Is there an implied overall theme relating to digital technology other than “the impact of technology on humanities research and pedagogy”? Are there other texts and readings other than “A Companion to Digital Studies”? How much of the course will be “learning about” as distinct from “learning how to”?

My initial reaction was umbrage; I was certain my colleagues’ technological reticence was clouding their judgement. But upon further reflection—which came through developing, revising, and re-revising this course from their feedback, and learning from students who have taken each version of the course—I believe they were almost entirely right to reject that first proposal.

As a result of these experiences, I’ve been thinking more and more about the problem of “digital humanities qua digital humanities,” particularly amidst the accelerated growth of undergraduate classes that explicitly engage with digital humanities methods. In the first part of this talk, I want to outline three challenges I see hampering truly innovative digital pedagogy in humanities classrooms. To do so, I will draw on my experiences at two very different campuses—the first a small, relatively isolated liberal arts college and the second a medium-sized research university—as well as those of colleagues in a variety of institutions around the country.

As an opening gambit, I want to suggest that undergraduate students do not care about digital humanities. I want to suggest further that their disinterest is right and even salutary, because what I really mean is that undergrads do not care about DH qua DH. In addition, I don't think most graduate students in literature, history, or other humanities fields come to graduate school primarily invested in becoming "digital humanists," though there are of course exceptions.

“Many Facts in Small Compass”: Information Literature in C19 Newspapers (MLA15 Talk)

slide 1

Ryan Cordell, Northeastern University

MLA 2015 | Vancouver, BC

slide 2

My remarks today will be drawn from my work on the Viral Texts project at Northeastern University. In brief, I’m working with a colleague in computer science to automatically identify the most frequently-reprinted texts in digitized archives of nineteenth-century newspapers. We have thus far drawn from the Library of Congress’ Chronicling America collection, but are currently expanding the corpora from which we are drawing to include magazines, as well as a broader selection of American and transatlantic newspapers. We have identified nearly half a million reprinted texts from the LoC’s nineteenth-century holdings. The majority of these were reprinted only a few times, but a significant minority were reprinted in 50, 100, or even 200 newspapers from this one archive.

We went into this project in search of the literature, such as newspaper poetry, that flourished in a print culture founded on textual sharing and through a deeply hybrid and intertextual medium. In the broadest sense, I hoped to expand our ideas of which writers resonated with nineteenth-century readers and create new bibliographies of popular but critically-overlooked literature.

slide 3

On this front the project has been promising. For every reprinted Longfellow poem we find many more by authors such as Elizabeth Akers Allen, Isabella Banks, Charles Monroe Dickinson, Colonel Theodore O’Hara, Emily Rebecca Page, Nancy Priest Wakefield, or John Whitaker Watson—or, perhaps even more likely, by an anonymous author. Such poems circulated within a system of exchanges and selection—newspaper editors cut, pasted, and recomposed content from their exchange partners and sent their papers to be similarly aggregated elsewhere.

But recognizably literary genres have been only a small part of the project. One of the most dramatic outcomes of this work thus far has been to highlight the importance of understudied genres of everyday reading and writing within the ecology of nineteenth-century print culture. These species of writing include political news, travel accounts, squibs, scientific reports, inspirational or religious exhortations, temperance narratives, vignettes, self-help guides, trivia, recipes, and even, to borrow a modern Internet term, listicles, all of which juxtaposed with poems, stories, and news on the page of the nineteenth-century paper. As a general (and perhaps unsurprising rule), the most frequently-reprinted pieces are concise, quotable, and widely relatable texts that would have been easy to recontextualize for different newspapers and new audiences—and that could easily fit gaps in the physical newspaper pages, as editors and compositors needed.

slide 4

My remarks today focus on those genres we might categorize as "information literature": lists, tables, recipes, scientific reports, trivia columns, and so forth. I want to separate these from news itself, which is certainly a kind of information genre, but which I would mark as stylistically and operationally distinct from the other genres I've listed. Here's one example of information literature, a list of supposed "facts," primarily about human lives and demographics, which was published under many names in at least 120 different newspapers between 1853 and 1899 (which is approximately one quarter of the nineteenth-century newspapers in Chronicling America).

On Ignoring Encoding

Lately we’ve seen a spate of articles castigating the digital humanities—perhaps most prominently, Adam Kirsch’s piece in New Republic, “Technology Is Taking Over English Departments: The False Promise of the Digital Humanities.” I don’t plan in this post to take on the genre or refute the criticisms of these pieces one by one; Ted Underwood and Glen Worthy have already made better global points than I could muster. My biggest complaint about the Kirsch piece—and the larger genre it exemplifies—would echo what many others have said: these pieces purport to critique a wide field in which their authors seem to have done very little reading. Also, as Roopika Risam notes, many of these pieces conflate “digital humanities” with the DH that happens in literary studies, leaving digital history, archeology, classics, art history, religious studies, and the many other fields that contribute to DH out of the narrative. In this way these critiques echo conversations happening with the DH community about its diverse genealogies, such as Tom Scheinfeldt’s The Dividends of Difference, Adeline Koh’s Niceness, Building, and Opening the Genealogy of the Digital Humanities, or Fiona M. Barnett’s “The Brave Side of Digital Humanities.”

Even taken as critiques of only digital literary studies, however, pieces such as Kirsch’s problematically conflate “big data” or “distant reading” with “the digital humanities,” seeing large-scale or corpus-level analysis as the primary activity of the field rather than one activity of the field, and explicitly excluding DH’s traditions of encoding, archive building, and digital publication. I have worked and continue to work in both these DH traditions, and have been struck by how reliably one is recongized—to be denounced—while the other is ignored or disregarded. The formula for denouncing DH seems at this point well established, though the precise order of its elements sometimes shifts from piece to piece:

  1. Juxtapose Aiden and Michel’s “culturomics” claims with the stark limitations of the Ngrams viewer.
  2. Cite Stephen Ramsay’s “Who’s in and Who’s Out,” specifically the line “Do you have to know how to code? I’m a tenured professor of digital humanities and I say ‘yes.'” Bemoan the implications of this statement.
  3. Discuss Franco Moretti on “distant reading.” Admit that Moretti is the most compelling of the DH writers, but remain dissatisfied with the prospects for distant reading.

These critiques are worth airing, though they’re not particularly surprising—if only because the DH community has been debating these ideas in books, blog posts, and journal articles for a long while now. Matt Jockers’ Macroanalysis alone could serve as a useful introduction to the contours of this debate within the field.

More problematically, however, by focusing on Ramsay and Moretti, these pieces ignore the field-constitutive work of scholars such as Julia Flanders, Bethany Nowviskie, and Susan Schreibman. This vision of DH is all Graphs, Maps, Trees and no Women Writers Project. All coding and no encoding.

Mea Culpa: on Conference Tweeting, Politeness, and Community Building

Kathleen Fitzpatrick’s post “If You Can’t Say Anything Nice” post about public shaming on Twitter came at a timely moment for me. Describing the culture of Twitter commentary, she writes:

You get irritated by something — something someone said or didn’t say, something that doesn’t work the way you want it to — you toss off a quick complaint, and you link to the offender so that they see it. You’re in a hurry, you’ve only got so much space, and (if you’re being honest with yourself) you’re hoping that your followers will agree with your complaint, or find it funny, or that it will otherwise catch their attention enough to be RT’d.

I’ve done this, probably more times than I want to admit, without even thinking about it. But I’ve also been on the receiving end of this kind of public insult a few times, and I’m here to tell you, it sucks.

I read this post while at a conference, and as I read it realized that I’d been guilty of just this kind of ungenerous commentary earlier in the day. I’d disagreed strongly with one of the presenters and written a series of critiques on Twitter, which many in my community found pithy and retweeted. Let me say: I absolutely believed in what I wrote, and I don’t retract the ideas. But in the Twitter exchanges around those posts, some of the conversation got more personal. The presenter—a fellow academic and human being named Elaine Treharne, not some nameless person‐read those exchanges after the panel and was deeply hurt by them. She was right. I was wrong. I tweeted an apology, but the entire affair, coupled with Kathleen’s post, kept working on me. I ended up chatting with Elaine for several hours yesterday evening about electronic fora, professionalism, and valid critique through channels such as Twitter. I think we both learned quite a bit; I know I learned quite a bit. We still don’t entirely agree on the substantive points from her presentation, but I hope we’re now friends as well as colleagues. She agreed to let me use her name in this post.

After yesterday’s experiences and conversations, I spent the evening considering my tweets over the past several conferences I’ve attended, including in the much-ballyhooed “Dark Side of DH” panel at MLA in Boston. Kathleen is absolutely right: our field needs to seriously consider both how our current Twitter culture developed, and how it might need to change moving forward. I need to seriously consider how I engage with colleagues on Twitter; I am not blameless and I need to reform. This post is my attempt to start thinking through both how the current Twitter culture came to be and where how we might change. The post owes any of its insights to Elaine’s generous willingness to talk seriously with me about these issues after being flamed by my community on Twitter.

Only a few years ago, DH was still a fringe field, mostly ignored by academia more widely. DHers felt not like “the next big thing,” but like an embattled minority. The community was very small, and the worry at conferences was about how to convince our colleagues that what we did was valuable. How can we get hired; how can we get promoted? How can we persuade the field to pay attention to this work we find remarkable? DHers were overrepresented in online fora such as Twitter, though, which became a place to build support communities for DH scholars who felt isolated on their campuses and within the wider academic community.

Within that framework, the back-chatter on Twitter was a valuable support mechanism. I remember sitting in a conference panel in my disciplinary field—nineteenth-century American literature—a few years ago when an eminent professor described the utter vapidity of modern reading practices (uncharitably: “kids these days with their screens! and their ADD!) compared to those of 150 years ago. Around the room, heads were nodding vigorously, and in the Q&A many other prominent members of my field rose to concur.

In that room, I felt like the oddball. My intellectual interests were being dismissed out of hand by the very people likely to decide whether my work would be published (and thus, whether I would get a job, get tenure, &c., &c.). I disagreed with them vehemently, but as a junior scholar was hesitant to challenge the rising consensus in the room, for fear that would further isolate me. And so I turned to Twitter to remind myself that I did have a community who would welcome my ideas on these issues. I tweeted my frustrations—I conferred with my dispersed but friendly DH community—and found support and engagement. Perhaps this doesn’t excuse public snarkiness, but that snark was a way of building community—certifying the value of unpopular interests and opinions. None of the eminent panelists from that session I attended read those conversations, nor would have. Nobody got hurt, and I felt less embattled and more prepared to go on with my work.

But that was several years ago, when I had far fewer followers on Twitter, and when DH was not at the center of the academy’s attention. Today many more academics, including those not heavily involved in DH, are on Twitter. And rather than being an nearly-ignored, fringe element of the academy, prominent DHers are being looked to as gatekeepers into a much-desired field. Panelists know to investigate how their sessions were tweeted, and they care what was said about them online. What’s more, many of our colleagues now know how to find tweets about them even when those tweets don’t include their names or usernames. We cannot assume that anonymous tweeting will do no harm to the colleagues we discuss. Tweets are not semi-private, whispered conversations in the back of the conference room; our tweets are very public and could unfairly shape public perception of the colleagues we discuss in them.

Within this framework, the same kind of Twitter chatter that helped build DH communities only a few years ago can resonate with newcomers to the field precisely as that vigorous denunciation of “technology” resonated for me as a young nineteenth-century Americanist. In other words, Twitter chatter can easily read not as community building, but as insider dismissal and exclusion. Such exchanges belie claims that DH is an open field, instead alienating scholars attempting to engage with it. We are no longer the upstarts; we are increasingly seen as the establishment. While this perception doesn’t exactly line up with reality, it certainly shapes the way our Twitter conversations—and in turn the wider DH field—are perceived by newcomers to it. In Elaine’s case, she felt she was being dismissed out of hand by scholars whose work she knows and respects; we had convinced her that she didn’t belong in DH. This is a terrible outcome our field should be wary of replicating.

Nevertheless, I remain firmly convinced that Twitter conversations can supplement and enrich academic conferences, providing a record of their proceedings, allowing scholars to engage actively with their presenting colleagues, and providing access to conferences to those scholars who cannot attend. But as a community, we need to think hard about how to retain the value of conference tweeting while mitigating the alienating effects of conference tweeting on our colleagues. This does not mean, I think, refraining from any critique on Twitter, but will mean remembering when crafting those critiques that there are real people on the receiving end.

Principles of Conference Tweeting

Going forward, I’m going to try and tweet conference panels following these principles.

  1. I will post praise generously, sharing what I find interesting about presentations.
  2. Likewise, I will share pertinent links to people and projects, in order to bring attention to my colleagues’ work.
  3. When posting questions or critiques, I will include the panelist’s username (an @ mention) whenever possible.
  4. If the panelist does not have a username—or if I cannot find it—I will do my best to alert them when I post questions or critiques, rather than leaving them to discover those engagements independently.
  5. I will not post questions to Twitter that I would not ask in the panel Q&A.
  6. I will not use a tone on Twitter that I would not use when speaking to the scholar in person.
  7. I will avoid “crosstalk”—joking exchanges only tangentially related to the talk—unless the presenter is explicitly involved in the chatter.
  8. I will refuse to post or engage with posts that comment on the presenter’s person, rather than the presenter’s ideas.

I am not calling for an embargo on conference tweeting, or for engagements exclusively devoted to agreement or confirmation. To turn conference tweeting into a tepid, timid echo chamber would not serve DH or the wider academy. But as the DH field grows and newcomers attempt to engage with it, we must consider the effect our chatter might have on them. I don’t want to make newcomers to DH feel as isolated as I felt in that room of eminent Americanists. Changing my public presentation on Twitter seems a small concession—worth making—if it will prevent that happening.

Thanks to Flickr users digitalART2, exquisitur, and brx0 for the Creative-Commons photos embedded here.

Thanks, Greg.

I’ve been moving through today slowly. I learned yesterday—first through Twitter, and then through several emails—that Greg Colomb, the Director of UVA’s Writing Program, passed away in his sleep. Judging by the reactions of friends in Charlottesville, this news was sudden and shocking. Several wrote on Facebook of meeting with Greg only days ago. They wrote that he was in those meetings his typical, jovial self. I was certainly surprised. I met Greg for coffee when I was in Charlottesville this summer. We spoke on the phone only a few weeks ago, planning a panel we were supposed to present together at CCCC this March. In those meetings he was full of life, full of energy, full of ideas. He was, in other words, Greg. Continue reading