Skip to Content

Pynchon's New Worlds

International Pynchon Week 2017

La Rochelle, June 5-9, 2017


'Maybe, but it’s code’s all it is': Thomas Pynchon, Cow Country, and Computational Stylometry
Martin Eve

Scheduled in the Hôtel Fleuriau: Tuesday 6 June, from 16:45 to 17:15

In mid-2015, Art Winslow caused something on an online furore when he suggested that the pseudonymously-authored novel by Adrian Jones Pearson, Cow Country, was, in fact, a work by Thomas Pynchon. A full-blown argument then erupted when this was countered by Nate Jones and Pynchon's own publisher. Indeed, Penguin thundered: “[w]e are Thomas Pynchon's publisher and this is not a book by Thomas Pynchon”.

While the great and the good of the contemporary republic of letters argued over authorship, however, a range of stylometric techniques exist that could assist in the debate. As the name implies, computational stylometry is the measurement (“metry”) of stylistic properties of texts (“stylo”) using computers. Stylometry, as a quantifying activity, has a long and varied history, from legal court cases where the accused was acquitted on the basis of stylometric evidence, to literary authorship attribution. In the latter case, as charted by Anthony Kenny, the discipline dates back to approximately 1851 when Augustus de Morgan suggested that a dispute over the attribution of certain epistles could be settled by measuring average word lengths and correlating them with known writings of St Paul. At the time of writing, according to Ariel Stolerman, computational forensic stylometry “can identify individuals in sets of 50 authors with better than 90% accuracy, and [can] even scaled to more than 100,000 authors”.

In this paper, I give a humanistic/critical background to stylometry and its important limitations before applying a range of stylometric techniques to the novels of Thomas Pynchon alongside that of “Pearson”. In particular, I examine the widely used unsupervised “Burrows's delta” algorithm of most-frequent-word comparisons as well as a part-of-speech frequency comparison using the Stanford PoS tagger.

At the close of the paper, I will give the results of my computational experiments, while still noting that we are far from having a perfect system for attribution. After all, literary forensics are almost always a post-facto attempt at attributing meaning, even in the anti-intentionalist schools. In this case, though, it may transpire that I have an answer (“maybe”). But as Pynchon puts it in Bleeding Edge: “it’s code’s all it is”, for sure.