Towards a Bibliography for AI Systems

The following remarks were prepared for The Andrew W. Mellon Society of Fellows in Critical Bibliography forum on “Preserving and Analyzing Digital Texts.”

In 2022 new AI image, text, video, and music models seem to appear every week, spurring waves of attention in social media and forcing constant reconsiderations of the horizons of such technology, as well as their implications to research and teaching. Most dramatic was the release of ChatGPT,¹ a large language model developed by OpenAI and released to much fanfare and critique in late 2022. Since that release this model has generated significant enthusiasm and anxiety among different stakeholders, from companies eager to integrate it into applications from writing assistance to search, to professors worried students will outsource writing their college papers to the technology.² The debate about ChatGPT has in many ways become a proxy for larger debates about algorithmic accountability, fairness, and reliability. In the mere months since ChatGPT was released, a yet newer generation large language model, which OpenAI promises will be “more creative and collaborative than ever before,” has been released as GPT-4.³ The pace of these AI announcements has certainly contributed to the sense that everyone, but perhaps especially those working in educational sectors, are scrambling to keep up.

At the core of many of the debates around AI models of culture—whether text, image, audio, or video—is anxiety about corporate control, proprietary software, and “black-box” systems. These models are machine learning systems, trained on massive amounts of data, but OpenAI and other companies typically do not disclose the exact composition of their models’ training data, making it difficult for users to ascertain the reliability, representativeness, or limitations of the models, or their relevance to different domains. That the companies building these models wish to obscure their provenance and functions, however, does not mean those things are unavailable for study and analysis.

This paper argues first that responding to this moment mandates developing toolkits for structural analyses of AI systems, and second that such methods should bring together two related traditions: the sociological school of bibliography and book history, which forefronts the linked technological, social, economic, and artistic contexts through which books and related media come into being, and the growing set of methods gathered under the mantle of “data archaeology,” which seek to outline similarly linked contexts through which datasets are created, distributed, and accessed. Such a bibliography for AI systems triangulates between materiality, infrastructure, and labor to describe the systems of digitization, archiving, extraction, and representation that lead from digitized and born-digital sources to the strings of text output by LLMs in response to user prompts. Just as D.F. McKenzie argued bibliographers should “describe not only the technical but the social processes of [texts’] transmission,” AI systems require a bibliography that studies the diffuse and often obscured sociology of textual data.⁴ In prior work I have argued for the value of “speculative bibliographies” for historical research, by which I primarily described enumerative bibliographies “in which textual associations are constituted propositionally, iteratively, and (sometimes) temporarily, as the result of probabilistic computational models.”⁵ Here, however, I advocate for speculative analytical or descriptive bibliographies that hypothesize about the material bases of closed AI systems based on extant clues about their composition and through reverse engineering of their outputs.

I approach this project first from the perspective of bibliography in part because of my own training, but also because the field offers a useful orientation and toolkit for the humanistic investigation of technical systems. Bibliographic methods were first developed to make predictions about a text’s material construction, editing, and significance from clues rooted historically in systems of production—the infrastructures of print culture. Signatures, chain lines, watermarks, and other physical clues were not placed in books deliberately by papermakers, printers, or binders for scholars, but are artifacts of production—papermaking, printing, binding—that scholars discovered could be reverse engineered in order to analyze the details of that production. In the late 20th century—I use this formulation even as I turn to dust—bibliographers such as Thomas Tanselle began thinking about how bibliographic frameworks could adapt to describe the increasingly dominant modes of electronic textual production, such as word processing, that would produce bookish objects going forward. Similarly to books in the printing press era or documents in the word processing era, complex computational systems in 2023 are not constructed with an eye toward transparency or material analysis, but leave clues that can be collated and theorized.

In addition to these historical parallels, I would also argue that LLMs are ripe for bibliographical analysis because they are, specifically and substantively, deeply bookish. We know they were trained, at least in part, on book data—as well as other forms familiar to bibliographers, such as academic journals, newspapers, and magazines. We might describe AI systems as “pastiche engines,” remixing existing cultural artifacts, rather than producing new ideas. This is both a critique and a means of locating these technologies within the continuum book historians and bibliographers study, and in fact places them in dialogue with a range of historical media that are also based in overlay, pastiche, and remix—media such as commonplace books, scrapbooks, palimpsests, or newspapers. The reprinting we study in the Viral Texts project is not identical to the species of text reuse produced by an LLM, but the practices share a family resemblance. As Ted Underwood has written, “the fuzzy, context-specific models produced by machine learning have a lot in common with the family resemblances historians glimpse in culture.”⁶ Because AI models are context-specific, I would argue, methods for contextualization—both quantitative and qualitative—will prove necessary for understanding them. LLMs and other AI models are not only bibliographic, but they do refract a sociology of texts that bibliographers can help assemble.

To date, much bibliographic study of digital textuality has focused on the materiality of computation—forensic studies of the hard drive’s textual inscriptions, for example, or vertical analysis of the layered software and hardware of a single ebook. Matthew Kirschenbaum’s landmark 2008 book Mechanisms traced both the “forensic materiality” of digital texts—e.g. the inscription of data onto the substrates of storage media—and formal materiality, which focuses on the representation of data in different computational systems—such as the ways specific file formats enable and constrain particular uses of data.⁷ It is difficult to know how we might trace the forensic materiality of an LLM’s output. While the data for LLMs exists somewhere, that existence is far more diffuse, abstract, and distributed than that of a discrete digital document or bound set of files on a particular computer, ebook reader, or server. Even if we had access to the corporate systems that house these models, is the textual data undergirding an LLM inscribed in any single place to which we could point, in the way that bibliographers point to a particular witness of a particular state of a particular edition of a particular book? Most likely not, as the models dynamically construct new texts each time a user prompts them, and these texts—while certainly not immaterial—are more dynamic and ephemeral than the mutable files on our hard drives.

To investigate the forensic materiality of LLMs, then, we should perhaps focus not on specific generated texts, but instead on the infrastructures undergirding AI systems. The field of book history was in many ways founded on Robert Darnton’s argument that the study of books should investigate the “communications circuit that runs from the author to the publisher…the printer, the shipper, the bookseller, and the reader,” an idea (and diagram) that has been much debated and amended since, but which remains central to the field’s work.⁸ To understand similar facets of AI systems, we could take as exemplars pieces like Kate Crawford and Vladan Joler’s “Anatomy of an AI System,” in which they seek to outline every element required for an Amazon Alexa device to answer a question, from the data model to the minerals mined to create the hardware.⁹ As Alan Liu has argued, cultural studies and critical infrastructure studies collapse into each other. In the twenty-first-century, this communications circuit has fused with the supply chain, and an analysis of contemporary textual systems like LLMs requires a more capacious account still.

If we prompt ChatGPT itself to describe its underlying hardware, its returns a cryptically generic answer:

According to a 2020 blog post by OpenAI, ChatGPT-3 (which is a larger version of the model that powers ChatGPT) runs on clusters of thousands of individual CPUs and GPUs, which are distributed across multiple data centers. The specific hardware configurations used in these clusters can vary depending on the specific task being performed by the model and the availability of different hardware components at any given time. OpenAI has also stated that its hardware infrastructure uses a combination of custom-built hardware and off-the-shelf components from vendors such as Nvidia and Intel. This allows the organization to optimize its systems for the specific needs of its AI models while also taking advantage of the latest advances in commercial hardware technology.

From here the reply notes “the specific hardware used to power ChatGPT and other OpenAI models is a closely guarded secret, as the organization sees its hardware infrastructure as a key competitive advantage in the field of AI research.” While we may not be able to learn from ChatGPT or OpenAI’s publications the precise make and model of the hardware on which ChatGPT runs, as a category such hardware also is not a profound mystery. We can make reasonable assumptions about the kinds of machines required to host and process data at this scale, or serve the model through a web application to users. A basic search for “GPT-3 hardware” will surface many experts attempting to estimate the precise stack that would be required to run the ChatGPT service, whether in more formal articles, social media threads, or Reddit forums, and publications by the teams behind different models—particularly older models, where there is less need for competitive secrecy—offer more specific details.¹⁰

Similar publications also offer insight into the training data behind LLMs. For instance, a 2022 paper by OpenAI researchers describes the development of InstructGPT, an essential element of the model underlying ChatGPT which improved responsiveness to prompts by hiring human “labelers” to annotate responses from the model.¹¹ These labels assigned value to responses that labelers found more or less useful, and were used to finetune the model to produce responses more similar to those preferred by human annotators. This small detail is key to understanding ChatGPT not as a fully automated computational system, but an iteratively-developed system with humans in the loop. Were we to diagram the communications circuit underlying ChatGPT, in other words, we would include labelers, as an important human element within the AI system, along with the low-wage workers in Kenya that a Time magazine expose found OpenAI using to help filter toxic content. Following McKenzie, our AI bibliography must include the many communities of human labor that contribute to each layer of these technical systems.

Several of my examples thus far have pointed to the hybrid communities of field experts, practitioners, and hobbyists working to collectively assemble knowledge about the training data and algorithmic processes of large AI models like GPT, often assembling cumulative insight through channels such as Twitter threads, Reddit fora, or linked Github repositories. I understand these efforts through a set of approaches increasingly advocated by ethicists, such as reverse engineering, data archaeology, and algorithmic auditing. We can distinguish these approaches mostly by their explicit goals, as the methods and results of all three overlap:

Reverse engineering is the broadest term, encompassing a range of projects that seek to understand the processes within a closed, undescribed, or “black box” system by dissecting, when possible, or by collating indirect evidence when direct access is unavailable. Reverse engineering happens outside of computational contexts, including even when companies seek to copy a successful product made by a competitor. In terms of software, however, reverse engineering may involve studying code and data, when available, to ascertain how it works, or when that is not possible comparing inputs and outputs to infer what happens within the system. As James A. Hodges and Ciaran B. Trace show in their recent article “Preserving Algorithmic Systems,” “attempts at achieving accountability for algorithms are hampered by factors ranging from commercial developers’ profit motives to the simple obscuring effects of time” and thus often “algorithmic functionality must be reverse-engineered.”¹² Reverse engineering is essentially what analytical bibliographers do for bookish objects when they use physical clues to understand how those objects were designed and created.
Data archaeology is a narrower concept describing attempts to thoroughly document a technical system—including its hardware, data composition, and software processes—for future developers or scholars, with an eye toward enabling replication. For example, Benjamin Lee’s recent_DHQ_ article traces the entwined technical and social stack enabling the Newspaper Navigator project he developed as the Innovator-in-Residence at the Library of Congress Labs.¹³ As Lee writes, formal data archeologies are written by scholars “in support of…call[s] for transparency and responsible stewardship” in computational research and represent an ideal for data-rich scholarship.
Algorithmic auditing focuses more specifically on identifying and mitigating harm perpetuated by automated systems, often with the goal of identifying implicit bias within systems. This sometimes happens proactively, when a company or organization takes steps to review the impacts of software to ensure they are not inadvertently marginalizing particular groups or identities. However, there is a growing body of scholarship that works to audit the processes of less forthright organizations, often through reverse engineering. Consider Safiya Noble’s Algorithms of Oppression, which reverse engineered the ways Google’s page rank algorithm reinforced racism through iterative search: comparing results to a range of search terms to understand subtle (and not so subtle) differences in the ways data about different racial groups was sorted and presented to users.¹⁴

When corporate secrecy or negligence prevent transparency, efforts to reverse engineer algorithmic systems provide valuable evidence for bibliographies of AI systems. Consider Vicki Boykis’ notebook “Everything I understand about chatgpt,” which attempts to collate both official and community documentation and outline a theory of ChatGPT’s training data and architecture. For example, Boykis highlights two components of training data named but not described in OpenAI’s publications, intriguingly titled Books1 and Books2, and points to literature describing the development of BookCorpus, which community investigators believe likely constitutes Books1. Boykis links both to formal publications, such as Jack Bandy and Nicholas Vincent’s “Retrospective Datasheet for BookCorpus”, as well as to work by people such as Shawn Presser, who provides access to the 18,000 plain text files that make up bookcorpus. These assembled resources allow researchers to explore at least some of the data thought to compose ChatGPT’s training corpora.

Alongside proprietary AI software, there is also a growing community of open-source models that can serve as proxies for understanding closed models. Databricks’ Dolly, for instance, sought to reverse-engineer the “instruction-following behavior” of InstructGPT and create a “truly open large language model” that would serve their customers who told the company they “would be best served by owning their models, allowing them to create higher quality models for their domain specific applications without handing their sensitive data over to third parties.” But Databricks also offers ethical reasons for creating this dataset, noting that “the important issues of bias, accountability and AI safety should be addressed by a broad community of diverse stakeholders rather than just a few large companies.” Looking beyond these stated reasons, the creation of such open-access mirrors of closed corporate systems provides a comparative framework for bibliographies of AI systems. We could investigate the prompts and responses recorded in Dolly’s training data in order to reconstruct how the data underlying a platform like ChatGPT is likely structured.

One principle for bibliographical investigation of AI will be iteration and collection. Users can probe the boundaries and assumptions of an LLM by simply asking variations on particular questions and observing the model’s responses. This is a form of close reading that seeks not to understand the full system, but to reverse engineer one small facet of the system, with a goal—perhaps—of collecting many such examples to build a wider knowledge base. In an assignment in my BookLab graduate seminar, for instance, I encouraged students to compare the outputs of different LLMs to the same prompt in order to hypothesize about how their training data is composed or how their models differ; to make incremental changes to prompts within a model to see how that changes the output; to evaluate the responses of LLMs to domains students knew well, to ascertain the veracity of the model’s data; and to interrogate “limit cases” that push a model beyond its capabilities. One student found ChatGPT incapable of producing bilingual text—at least in the hybrid, fast code switching way she wanted to see—producing instead text that switched languages by paragraph. Other students struggled against the guardrails OpenAI has placed on the application—not, importantly, the model itself—to produce an “insult” or “diss track.” While the GPT language model is perfectly capable of assembling negative language, OpenAI restricts such language in order to prevent the internet from internetting. Such discrete limit testing will not produce a theory of ChatGPT writ large, but the accumulation of such tests can begin filling out a picture of what kinds of texts comprise its training data and the ways the model has been fine tuned to enable certain kinds of outputs and prevent others.

This iterative approach is also amenable to computational investigation. At the beginning of this talk I compared the textual remixing of LLMs, which I described as “pastiche engines,” with the historical newspaper reprinting we study in Viral Texts. Given that similarity, the reprint detection methods we use for tracing historical reprinting might offer an apporach to reverse engineering an LLM like GPT. I won’t belabor an explanation of those methods here, but we detail them in the second chapter, “Textual Criticism as Language Modeling” of our Manifold scholar book, Going the Rounds. In extreme brief, however, we use a fuzzy matching method that identifies sets of duplicated phrases across texts. The texts do not have to match from end to end, but simply contain enough matching phrases to be flagged as “the same” by our algorithm.

While LLMs assemble text probabilistically, drawing on a wide range of textual sources, they nevertheless retain signals from their training data. If asked to write about a given topic, the words most likely to follow other words will bear a resemblance to the words that follow other words in training data about that topic. This is one reason that users have observed instances of plagiarism in LLM output. Often the best sequence of words—from the perspective of a language model at least—is precisely the sequence of words in an existing source, particularly for topics without wide coverage in the model’s training data.

To observe this on a small scale, I used the GPT-4 API to generate texts on various topics and then compared ngrams—sequences of n words length—between the generated texts and potential open-access sources. When I asked for a biography of nineteenth-century American author Fanny Fern,¹⁵ for example, I found that the generated biography shared 57 matching 4-grams with Fern’s Wikipedia page, 21 matching 4-grams with the biography on the Fanny Fern in the New York Ledger DH project website, 24 with Fern’s bio on the History of American Women site, 27 with Fern’s entry at Encyclopedia.com, and none with a Georgetown University bio of Fern. Similarly, a prompt asking for a definition and history of the sonnet returned an answer that shared 26 4-grams with entries on the sonnet in Wikipedia, 8 with Poets.org, 14 with Encyclopedia Britannica, 13 with the Poetry Foundation, and 11 with ThoughtCo.¹⁶ These examples are idiosyncratic and partial—proofs of concept—but more sustained and systematic work along these lines is both possible and necessary.

In fact, even as I was making these small experiments the Washington Post published a collaborative investigation, with the Allen Institute for AI, which takes a very similar approach to reverse engineer the sources of Google’s C4 dataset.¹⁷ C4, by the way, stands for Colossal Clean Crawled Corpus, which is a fact I just had to share. To determine which open websites were crawled, the researchers “ranked the remaining 10 million websites based on how many ‘tokens’ appeared from each in the data set. Tokens are small bits of text used to process disorganized information—typically a word or phrase.” As in my smaller experiments, they find that Wikipedia is by far the largest source, but as they note others that are less obvious,

Some top sites seemed arbitrary,like wowhead.com No. 181, a World of Warcraft player forum; thriveglobal.com No. 175, a product for beating burnout founded by Arianna Huffington; and at least 10 sites that sell dumpsters, including dumpsteroid.com No. 183, that no longer appear accessible.

The Post article provides a tool that enables readers to enter a domain to see how many of its tokens it shares with the C4 dataset, which signals how likely it was to have contributed to the corpus.

The goal for research along these lines cannot be to prove definitely that a particular model draws on this source or that, though such trends may emerge for more influential sources. Instead, the goal should be to generate probabilities about different potential sources: a speculative analytical or descriptive bibliography that gives us a framework for hypotheses and analysis. To return to my initial claims, it is precisely here that I think bibliographical traditions offer helpful analogs. Though some bibliographers have conceived of their work as empirical, in reality every collation formula is a speculation, though one informed by physical evidence. Just as the industrial processes of print culture left traces that we can use to theorize about physical texts’ production and use, the computational processes of digital culture leave traces we can use to theorize about AI texts.

Bringing together bibliographical and computational approaches allows us to perceive AI systems not as unknowable oracles but instead bound, describable, and comparable cultural artifacts. Bibliography’s close attention to understanding and describing the combined material, technological, economic, and social facets of cultural production are precisely what is lacking in much tech discourse in 2023, and are precisely what is needed both to understand the shifts we are undergoing and to advocate for more transparency and responsive systems in the future. In making this argument I am not trying to claim large language models for bibliography, and there are certainly other AI systems that demand distinct expertise. But to the extent that language models are textual media that both draw on and contribute to book culture, they should be the province of bibliographers working in close collaboration with computer scientists, information scientists, technologists, and related experts. AI systems require a bibliography that takes serious the diffuse and often obscured sociology of textual data.

Appendix A: GPT-4 Bio of Fanny Fern

Fanny Fern: A Biography

Early Life

Fanny Fern, born Sara Payson Willis on July 9, 1811, in Portland, Maine, was an American novelist, journalist, and feminist activist. She was the fourth of nine children born to Nathaniel and Hannah Willis. Her father was a newspaper publisher and writer, which played a pivotal role in Fern’s interest in writing.

In her early life, Fern, like her siblings, was educated at Catherine Beecher’s boarding school in Hartford, Connecticut. Following her education, she returned to Boston, where her family had moved, and began teaching at a local school.

Personal Life

Fanny Fern married banker Charles Eldredge in 1837, and together they had three children, Mary Stace, Grace, and Ellen Willis Eldredge. Tragically, Charles died of typhoid fever in 1846, leaving Fern to support her family alone.

In 1849, Fern remarried to Samuel P. Farrington, a merchant. However, the marriage was unhappy, and the couple divorced in 1851. Following her divorce, Fern decided to seek financial independence through her writing to support herself and her children.

Career

In 1851, adopting the pen name Fanny Fern, she began writing essays, sketches, and short stories for several newspapers and periodicals, including “The Olive Branch” and “True Flag.” These writings won her a widespread reputation as a courageous, witty, and insightful writer who addressed prevailing social issues, such as women’s rights, marriage, education, and labor rights.

Despite facing criticism from her male counterparts and dealing with societal norms of that time, Fern’s perseverance led to significant accomplishments. In 1853, she published her first collection, “Fern Leaves from Fanny’s Portfolio,” which sold over 70,000 copies in its first year. Several more volumes of “Fern Leaves” soon followed.

Fanny Fern’s career as a writer took a leap forward in 1855 when she joined the “New York Ledger” as a columnist. She became the highest-paid columnist in the United States at that time, earning $100 per week. Her column, filled with satirical and humorous commentary on contemporary social issues, made her a household name.

In 1856, she remarried again, to newspaper editor and biographer James Parton, who encouraged her literary career. Her novel “Ruth Hall,” published that same year, is largely autobiographical, chronicling the struggles of a young widow trying to support her children through writing. It became a bestseller and was praised for its realistic portrayal of women’s experiences.

Later Years and Legacy

Fern continued to write for the “New York Ledger” and authored more novels, such as “Rose Clark” (1856) and “Folly as It Flies” (1868). While her popularity diminished in the late 19th century, she made a significant contribution to American literature from a female perspective.

Fanny Fern died of cancer in New York City on October 10, 1872, at the age of 61. Today, she is celebrated as a pioneer in feminist writing, and her works have inspired generations of women writers to explore themes that were considered taboo in her time.

Appendix B: GPT-4 Description of the Sonnet

The sonnet is a poetic form that consists of 14 lines, known for exploring themes of love, beauty, and the passage of time, though it has also been used to address a variety of other subjects. It is believed to have originated in Italy in the 13th century and was later popularized by the Italian poet Petrarch in the 14th century. Petrarch’s sonnets, written in Italian, primarily focused on his unrequited love for a woman named Laura. Throughout the years, the sonnet has evolved, and its structure and rhyme schemes have adapted to the preferences of different poets and languages.

The sonnet is typically written in iambic pentameter, a metrical pattern consisting of five pairs of stressed and unstressed syllables per line. There are two main types of sonnets: the Italian (or Petrarchan) sonnet and the English (or Shakespearean) sonnet.

The Italian sonnet is divided into two parts: the octave, which consists of the first eight lines, and the sestet, which includes the remaining six lines. The octave presents a problem, situation, or question, while the sestet resolves or comments on it. The rhyme scheme for the octave is usually ABBAABBA, while the sestet has various rhyme patterns, such as CDECDE or CDCDCD.

The English sonnet, on the other hand, is organized into three quatrains, each comprising four lines, and a final rhymed couplet. The rhyme scheme for this type of sonnet is ABABCDCDEFEFGG. The three quatrains often introduce different aspects or develop a single idea, while the final couplet offers a conclusion, a twist or a resolution to the preceding lines.

As the sonnet spread from Italy to England in the 16th century, it was embraced and developed by English Renaissance poets like Sir Thomas Wyatt, Edmund Spenser, and, most notably, William Shakespeare. Shakespeare composed a total of 154 sonnets, which cover themes of love, beauty, and the transient nature of life.

The sonnet form continued to thrive in the 17th century through the works of poets like John Donne and Milton, and it has remained a popular form among poets throughout history. Romantic poets like William Wordsworth, John Keats, and Percy Bysshe Shelley reinvigorated the form in the 19th century; more recently, in the 20th and 21st centuries, it was seen in the works of poets like Elizabeth Barrett Browning, e. e. cummings, and Seamus Heaney.

In summary, the sonnet form is a 14-line poem renowned for its exploration of love, beauty, and time. Though it originated in Italy, it has since been embraced and adapted by poets across the world, evolving through various rhyme schemes and structures while maintaining its distinctive essence.

For OpenAI’s introduction to ChatGPT, see https://openai.com/blog/chatgpt/. For the best description I have found thus far about how a system like ChatGPT produces language, see Stephen Wolfram’s post “What Is ChatGPT Doing … and Why Does It Work?” https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/. ↩
An overview of the latter debate can be found in Susan D’Agostino’s Inside Higher Ed article “ChatGPT Advice Academics Can Use Now” (12 January 2023) which includes interviews with faculty across the spectrum of enthusiasm to skepticism: https://www.insidehighered.com/news/2023/01/12/academic-experts-offer-advice-chatgpt ↩
See https://openai.com/product/gpt-4 for the official description of GPT-4 from OpenAI. ↩
D. F McKenzie, Bibliography and the Sociology of Texts (Cambridge: Cambridge University Press, 2004), pg. 13. ↩
Ryan Cordell, “Speculative Bibliography,” Anglia 138, no. 3 (September 15, 2020): 519–31, https://doi.org/10.1515/ang-2020-0041. ↩
Ted Underwood, “Why an Age of Machine Learning Needs the Humanities,” Public Books (blog), December 5, 2018, https://www.publicbooks.org/why-an-age-of-machine-learning-needs-the-humanities/. ↩
Matthew G Kirschenbaum, Mechanisms: New Media and the Forensic Imagination (Cambridge, Mass.: MIT Press, 2008). ↩
Robert Darnton, “What Is the History of Books?,” Daedalus 111, no. 3 (July 1, 1982): 65–83. ↩
Kate Crawford and Vladan Joler, “Anatomy of an AI System: The Amazon Echo As An Anatomical Map of Human Labor” (AI Now Institute and Share Lab, September 7, 2018), http://www.anatomyof.ai. ↩
Tom B. Brown et al., “Language Models Are Few-Shot Learners” (arXiv, July 22, 2020), http://arxiv.org/abs/2005.14165. ↩
Long Ouyang et al., “Training Language Models to Follow Instructions with Human Feedback” (arXiv, March 4, 2022), http://arxiv.org/abs/2203.02155. ↩
James A. Hodges and Ciaran B. Trace, “Preserving Algorithmic Systems: A Synthesis of Overlapping Approaches, Materialities and Contexts,” Journal of Documentation ahead-of-print, no. ahead-of-print (January 1, 2023), https://doi.org/10.1108/JD-09-2022-0204. ↩
Benjamin Lee, “Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset,” Digital Humanities Quarterly 015, no. 4 (2021), http://www.digitalhumanities.org/dhq/vol/15/4/000578/000578.html. ↩
Safiya Noble, Algorithms of Oppression: How Search Engines Reinforce Racism, (New York: NYU Press, 2018). ↩
I have copied the full Fanny Fern biography generated by GPT-4 in Appendix A. ↩
I have copied the full text generated by GPT-4 about the sonnet in Appendix B ↩
Kevin Schaul, Szu Yu Chen, and Nitasha Tiku, “Inside the Secret List of Websites That Make AI like ChatGPT Sound Smart,” Washington Post, April 19, 2023, https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/. ↩