You might wonder, for example, what place or location names appear in American literary texts published in 1851, and you devise a program that will tell you. You will then have data.
But what do you do with the data?
The example is not a hypothetical one. It is put forward by Matthew Wilkens in his essay “Canons, Close Reading, and the Evolution of Method” (“Debates in the Digital Humanities,” ed. Matthew Gold, 2012). And Wilkens does do something with the data. He notices that “there are more international locations than one might have expected” — digital humanists love to be surprised because surprise at what has been turned up is a vindication of the computer’s ability to go beyond human reading — and from this he concludes that “American fiction in the mid-nineteenth century appears to be pretty diversely outward looking in a way that hasn’t received much attention.”
More international locations named than we would have anticipated; therefore mid-19th century American fiction is outward-looking, a fact we would not have “discovered” were it not for the kind of attention a computer, as opposed to a human reader, is capable of paying . . .
But does the data point inescapably in that direction? Don’t we have to know in what novelistic situations foreign lands are alluded to and by whom? If the international place names are invoked by a narrator, it might be with the intention not of embracing a cosmopolitan, outward perspective, but of pushing it away: yes, I know that there is a great big world out there, but I am going to focus in on a landscape more insular and American. If a character keeps dropping the names of towns and cities in Europe, Africa and Asia, the novelist could be alerting us to his pretentiousness and admonishing the reader to stay close to home. If a more sympathetic character daydreams about Paris, Istanbul and Moscow, she might be understood as caressing the exotic names in rueful recognition of the experiences she will never have.
The list of possible contextual framings is infinite, but some contextual framing is necessary if we are to move from noticing the naming of international locations to the assigning of significance. Otherwise we are asserting, without justification, a correlation between a formal feature the computer program just happened to uncover and a significance that has simply been declared, not argued for. (Frequency is not an argument.) Don’t we have to actually read the books, before saying what the patterns discovered in them mean?
I agree with Fish that “data mining” as he describes it is inadequate to the task of interpretation as he defines it. However, data mining is not meant to (nor can it) provide information about something as abstract as “intent;” it is a way of looking for and at patterns in data sets that are too large for human comprehension, and that address a larger issue than the author’s “intentionality.” Screening out data from “noise” is precisely what such algorithms are designed to accomplish; and the patterns discovered have less to do with the “intentionality” of an author’s writing than the circulation of ideas, words, and concepts throughout a historical period or geo-spatial context.
I think the example he cites about “more international locations named than we would have anticipated” in American fiction is precisely the sort of data that is interesting. Fish argues that we can’t determine the “direction” of that international interest for any particular text without close reading, and he questions whether the directionality of the interest can be meaningfully ascertained from the aggregate data. But “outward-looking” means something different in a geo-political or geo-spatial analysis than in a close reading. This is an analysis that moves away from individual authority and intentionality, from the text as an artifact of a particular human intelligence, and looks at the cultural field of translatlantic influence in a particular historical, cultural, geographic moment. Mid-19th century American texts have a larger boundary or horizon than British texts, perhaps, and this is one way to measure it.
Data needs interpretation, yes; but close reading is not the only interpretive strategy out there.