Jockers, Matthew, Stanford University, USA, email@example.com
Whether consciously influenced by a predecessor or not, it might be argued that every book is in some sense a necessary descendant of, or necessarily ‘connected to, ’ those before it. Influence may be direct, as when a writer models his or her writing on another writer,2 or influence may be indirect in the form of unconscious borrowing. Influence may even be ‘oppositional’ as in the case of a writer who wishes to make his or her writing intentionally different from that of a predecessor. The aforementioned thinkers offer informed but anecdotal evidence in support of their claims of influence. My research brings a complementary quantitative and macroanalytic dimension to the discussion of influence. For this, I employ the tools and techniques of stylometry, corpus linguistics, machine learning, and network analysis to measure influence in a corpus of late 18th- and 19th-century novels.
The 3,592 books in my corpus span from 1780 to 1900 and were written by authors from Britain, Ireland, and America; the corpus is almost even in terms of gender representation. From each of these books, I extracted stylistic information using techniques similar to those employed in authorship attribution analysis: the relative frequencies of every word and mark of punctuation are calculated and the resulting data winnowed so as to exclude features not meeting a preset relative frequency threshold.3 From each book I also extracted thematic (or ‘topical’) information using Latent Dirichlet Allocation (Blei, Ng et al. 2003; Blei, Griffiths et al. 2004; Chang, Boyd-Graber et al. 2009). The thematic data includes information about the percentages of each theme/topic found in each text.4 I combine these two categories of data – stylistic and thematic – to create ‘book signals’ composed of 592 unique feature measurements. The ‘Euclidian” metric is then used to calculate every book’s distance from every other book in the corpus. The result is a distance matrix of dimension 3,592 x 3,592.5
While measuring and tracking ‘actual’ or ‘true’ influence – conscious or unconscious – is impossible, it is possible to use the stylistic-thematic distance/similarity measurements as a proxy for influence.6 Network visualization software can then be used as a way to organize, visualize, and study the presence of influence among of books in my corpus.7
This is a really interesting measure of literary “influence.” My project indicates influence in a different fashion, by tracing the relations between words or tropes in literary circles.