Introduction to the Digital Humanities
Dr. Ryan Cordell, June 12, 2012
I. What is DH? (2:00pm-2:45pm)
Before today’s session you all read the following articles:
- Matthew Kirschenbaum, “What is Digital Humanities and What’s it Doing in English Departments?”
- All six articles in the New York Times’ series, “Humanities 2.0”
As a class, we’ll use these articles (and our discussion) to do the following:
- Collaborate on a basic, working definition of “digital humanities”;
- List the various kinds of methodologies that comprise the DH field;
- Identify the common priorities, concerns, and/or values of DH practitioners;
At the end of this crowdsourcing session, I will present some of my answers to these same questions. You can view my slides here.
II. Not Reading C19 Novels: Research (2:45-3:30)
The idea for this workshop was stolen from my colleague Paul Fyfe of Florida State University. He describes his version of the assignment in “How Not to Read a Victorian Novel,” Journal of Victorian Culture 16, no. 1 (April 2011). Here’s how Paul introduces the assignment for his students:
Franco Moretti was dissatisfied with how literary scholars accept just a handful of possible texts as representative of cultural eras. Even if those texts are diverse and interesting, how can they possibly represent broader trends at scale? Moretti wants to change our sense of literary history by enlarging it, or by increasing our critical distance from it. He coined the phrase “distant reading” as an approach to analyzing lots and lots of texts instead of an unrepresentative few. Distant reading uses other modes of analysis and models of interpretation than the “close reading” we are familiar with. In his own work, Moretti compiles textual information from lots and lots of novels into maps, graphs, and logical trees. Seen this way, texts can reveal new patterns and language trends than we could otherwise discover close up. An array of digital visualization and text analysis tools now make Moretti’s methods more accessible to the casual user. The first paper will be an experiment in using these tools. We will consider “distance” not only as the subject of our course but also as a potential mode of reading and interpretation. What does literary criticism and analysis look like if we accept distance “as a condition of knowledge”?
Distance is a pretty good approach to the Victorian novel, considering that 40,000+ books of prose fiction were published in the last two-thirds of the nineteenth century. No one can read them all. But perhaps we can learn how to not read them. As Moretti and others have demonstrated, digital technology provides lots of interesting ways of doing this. Using some selected tools, you will analyze a big Victorian novel and then write a paper explaining your questions and insights. There’s one catch: it has to be a book you have never read.
English classes more typically emphasize close reading than “not reading.” This exercise will be new to many of you. So will the technology and the interfaces. The paper requires thinking about texts in a very different way than you might be used to. There may be dead ends; on the other hand, there will be no wrong answers. This preludes two important points:
- Play. Experiment. This assignment is as much about testing the methods as it is learning about the text. The goal here is not to reconstruct a missing story, but to “read” the novel in a fundamentally different way, and to think about the implications of doing so.
- Ask for help. Please don’t struggle with the technology, or tear hair in confusion about the assignment. Visit my office hours or email for an appointment if you’d like to go over this, work out a problem, or discuss how to talk about your results.
- Use frustration creatively. This is perhaps the hardest and most essential trick. If you hit a dead end, feel frustrated, or get null results, how can you use that to learn? In other words, what might be the values of that frustration or failure in thinking about your critical approach? Try to take any moment of frustration as instead an opportunity to reflect on the kinds of questions you are asking and how you might change them.
Ready to get started?
So here’s how you should proceed for this workshop:
Find a partner to work with.
Choose a big C19 novel to not read.
Remember that you must choose something you’ve never read before. Perhaps you’ll pick a famous novel you’ve always wanted to read, but could never find the time for. You must choose a novel that you can find the entire text for online, likely on Project Gutenberg. A few possibilities (don’t tell Dr. Fyfe, but I’ve snuck a few American works into this list):
- Louisa May Alcott, Little Women
- Charlotte Brontë, Jane Eyre
- Emily Brontë, Wuthering Heights
- Charles Chesnutt, The Marrow of Tradition
- Martin Delany, Blake; or the Huts of America
- Charles Dickens, Bleak House, A Tale of Two Cities, or David Copperfield
- George Eliot, Middlemarch
- Harold Frederic, The Damnation of Theron Ware
- Elizabeth Gaskell, North and South
- Thomas Hardy, Tess of the d’Urbervilles
- Henry James, The Portrait of a Lady Volume 1 and Volume 2
- Herman Melville, Moby Dick
- Bram Stoker, Dracula
- William Thackeray, Vanity Fair
- Susan Warner, Wide, Wide World
Make word clouds
When provided with a bunch of text, tag cloud or word cloud engines will return you a graphical representation of the most common words: the more frequently a word appears in the text, the larger it appears relative to other words on the screen.
Wordle is nice for making word clouds because, once your word cloud gets generated, you can toggle common English words (e.g. and, the, if) on or off, and you can customize or even “randomize” the display, allowing you different visualizations of the data. Using the text of your chosen novel, experiment with Wordle until you get comfortable with the interface. Then run a couple of different tests with Wordle, making notes of your observations along the way:
- Generate a cloud for the whole text by copying and pasting the entire text into the Wordle box (make sure not to paste all the header and footer information; only the text itself). How you might “read” this? Come up with a few different observations. What kinds of words are there? Are there patterns or in/consistencies in the words? What words (or kinds of words) are relatively more or less frequent?
- Try breaking the book into chapters or sections (many Victorian novels were first published in monthly parts or in three volumes). Paste individual sections in, generate word clouds, and see what you can regenerate from a “distant” perspective.
- Play with stoplists: in Wordle, toggle on/off the common English words. (You can also create your own custom stoplist, which is a little more advanced.)
Reveal your texts
Word clouds are a first step (on the ProfHacker blog, Julie Meloni called word clouds a “gateway drug” to textual analysis). Next, we will run (slightly) more sophisticated text analysis software on the file using tools provided by Voyant (Voyant has had server troubles lately; if that link doesn’t work, use this link to the software on another server. Copy the text of your chosen novel into the box and click “reveal.” Initially Voyant’s results will look much like Wordle’s. You’ll see a word cloud in the top left corner of the screen, a summary of results below it, and the text of your chosen novel in the center. If you click “more…” in the summary window, however, another window will open below it showing the “words in the entire corpus.” “Corpus” means “a collection of written works,” and Voyant can be used to analyze many texts together; in this case, however, your corpus is one novel.
Look at the words by frequency. You might have to scroll through a few pages before you get past common words such as “the,” “and,” and so on. What are the first few less common words that appear most frequently in your novel? Double click on of the words listed, and a new set of tools will open on the right side of the window. You can look at “word trends,” which plots the relative frequency of words at different points in your novel. Below this you can click to open “Keywords in context,” which shows the words that appear around the word you’re analyzing within the text. If you look at the text in the center of the window, you’ll see that there’s now a “heat map” running along its left-hand margin which shows where your chosen word appears most frequently within the text. Jot down some notes about this word, and then compare those results with several other words in the “Words in the Entire Corpus” menu.
Some questions to consider as you play with Voyant: does more focused attention to word frequency change your opinions about your book? What about scarce or infrequent words? What still don’t you know? In other words, what additional information might you need to gain insights? What insights, if any, do these tools provide? What keywords or patterns did you pursue and why? What might you suspect are the values and/or limitations of “not reading” this way? Where might it be useful in future research projects or in analyzing other kinds of texts?
(if you have time) Explore the wonderful world of Ngrams
Google’s Ngram Viewer displays the frequency of worlds over time by drawing on the massive Google Books corpus, which includes the text of more than 15 million books. For more on Ngrams, check out the Culturomics site. Choose several of the words you’ve concentrated on in your previous analyses and enter them into the Ngram viewer. Look at the frequency of those words through time, paying particular attention to their frequency when your chosen novel was published. Do any of them stand out, either as particularly common words during their time or, perhaps as interestingly, as particularly uncommon words during their time. Try a few more words from the frequency lists you generated in TAPoR earlier. The big question here: can a tool like the Ngrams viewer, which analyzes so many texts, help you understand anything about the historical place of a book you’ve never read?
III. Not Reading C19 Novels: Presentations (3:30-4:00)
- Review the notes from your research and prepare a short (3-4 minute) presentation for the class about your work. What did “distant reading” teach you about your chosen work? What did it fail to teach you?
- Each pair will present their findings in 3-4 minute presentations to the class. These will be highly informal and allow for plenty of conversation.
IV. A Project-Based Field: Research (4:00-5:00)
The best way to really get a handle on what digital humanists do is to engage with digital humanities projects. In our remaining time together, we will look at sample digital humanities projects, including:
- Bracero History Archive, http://braceroarchive.org/
- Brown University Women Writers Project, http://www.wwp.brown.edu/
- Civil War Washington, http://civilwardc.org/
- Digitizing “Chinese Englishmen,” http://chineseenglishmen.adelinekoh.org/
- Hypercities, http://hypercities.com
- Juxta, http://www.juxtasoftware.org/
- Visualizing Emancipation, http://dsl.richmond.edu/emancipation/
As we investigate these projects, consider the following questions:
- In John Unsworth’s talk, “Scholarly Primitives,” Unsworth argues that all scholarship makes use of the same basic tools, such as discovering, annotating, and comparing. In what way does your project meet these scholarly needs?
- What assumptions have been made in designing the project? (What are their sources? How is the site designed? &c. &c.)
- What is the project’s primary audience? Is it addressed to other researchers, students, or both?
- What are the project’s strengths and weaknesses?
- The big one: what do you think this project contributes to the larger body of knowledge in its field?
If you’re interested in digging deeper into the digital humanities, you should read Lisa Spiro’s article “Getting Started in the Digital Humanities.” If offers a wonderful, step-by-step plan for familiarizing yourself with the field.