Two (of Three) Ways of Looking at C19 Newspaper Exchange Networks

I wrote the following as part of my preparation for next week’s second meeting of the NHC Summer Institute in Digital Textual Studies next week. The post assumes a modest working understanding of network graphs and their terminology. For a primer on humanities network analysis, see the links for my network analysis workshop or, more specifically, see Scott Weingart’s ongoing series Demystifying Networks, beginning, appropriately enough with his introduction, his second post about degree, and possibly his post on communities.

Introduction

In previous work in American Literary History, I argued that reprinted nineteenth-century newspaper selections should be considered as authored by the network of periodicals exchanges. Such texts were assemblages, defined by circulation and mutability, that cannot cohere around a single, stable author. As part of this argument, I demonstrated how social network analysis (SNA) methods might employ large-scale data about reprinting to illuminate lines of influence among newspapers during the period. In that early network modeling, I represented individual newspapers from our reprinting data—at the time drawn primarily from the Library of Congress’ Chronicling America collection—as nodes, connected by edges that represented texts printed in common between papers. Those edges were weighted by frequency of shared reprints. The working assumptions behind those models were these: 1.) the fact that two newspapers reprint this or that text in common says very little about their relationship, or lack thereof, during the period and 2.) that when two newspaper printed hundreds, thousands, or even tens of thousands of texts in common, this fact is a strong signal of a potential relationship between them.

A selection from a single cluster in the Viral Data. Each line represents a specific reprint from the larger cluster, which is identified by the ID in the first column. You can browse the cluster data I used for these experiments. These are themselves experimental clusters using a new version of the reprint-detection algorithm, and are not yet suitable for formal publication.

Our data about reprinting in the Viral Texts Project is organized around “clusters”: these are, essentially, enumerative bibliographies of particular texts that circulated in nineteenth-century newspapers, derived computationally through a reprint detection algorithm that we describe more fully in previous publications.1 From these chronologically-ordered lists of witnesses, we derive network structures by tallying how often publications appear in the same clusters. When two publications appear together in a particular cluster, they are considered linked, with an edge of weight 1. Each subsequent time those same publications appear together in other clusters, the weight of their edge increases by 1; ten shared reprints results in a weight of 10, one hundred shared reprints in a weight of 100. Thus the final network data shows strong links between publications that often print the same texts and weaker links between publications that occasionally print the same texts. Continue reading

Network Analysis Workshop

I regularly run workshops on humanities network analysis. For participants, I’ve compiled some starting instructions, sample data files, and suggested reading below.

Recommended Reading

Tools for Network Analysis

There are many options at various skill levels for humanists interested in network analysis. Here are just a few:

  • If you’re looking for an especially straightforward platform for basic network analyses, you might check out Palladio which adapts the platform designed for Stanford’s Mapping the Republic of Letters project for other scholars’ use. Martin Düring’s tutorial at the Programming Historian focuses on extracting network data from unstructured text and visualizing it in Palladio, and Miriam Posner’s “Getting Started with Palladio” introduces the tool’s network functionalities (along with much else).
  • You can also create basic network graphs using Fusion Tables.
  • If you are running Windows with Microsoft Excel installed, Node XL aims to make generating network graphs from an Excel spreadsheet as easy as creating a pie chart. Unfortunately Node XL is incompatible with Mac versions of Excel.
  • And of course, if you’re comfortable with programming languages there are plenty of methods for generating network graphs by hand. Taylor Arnold and Lauren Tilton write about using R for network analysis in Humanities Data in R and Lincoln Mullen has a growing resource in Digital History Methods in R, including an in-progress chapter on networks.

This Workshop: Gephi

For this workshop, we will be using Gephi, one of the most widely-used tools for network analysis and visualization. You will need to download and install the application before we can get started. If you find it runs slowly (or not at all) you might need to update Java on your system.

Workshop Data

Sample data can be found in this folder. You can download them all as a zip file or download files separately as we need them.