To tackle this idea of “distant reading,” I read a selection of articles and blog posts on the subject and was asked to then use a couple commonly accessible distant reading sites to put the idea into practice. Distant reading is the practice of taking an extremely large body of literary work – for example, all plays written in England between 1500 and 1700 – dumping them into a computer processor, and then analyzing the trends that you see such as word frequency or plot types. The idea is that recognizing patterns in an astronomical sample of texts will give us insight into the literary time period. This is a new idea, and I believe that distant reading – because it is so accessible – can become an excellent aid for scholars and theorists of literature.
To put this idea into practice, I used two commonly found
websites that participate in this big data analysis. First, I entered five
different articles/blog posts on this idea of distant reading (referenced
below) into the program called Voyant. Then I entered some data into Google’s Ngram program to gain a different perspective on the texts.
When you dump a collection of texts into Voyant, the program
recognizes word repetition and presents it to the user in a variety of ways
such as graphs, clouds, text highlighting, etc.. I was let down to find that
the most used words in the readings were “the,” “of,” “to,” “a,” etc.. I
searched and searched, but couldn’t find a filter that would eliminate those
seemingly benign words. (I call them benign, because they didn’t align with my
interests. However, as one article pointed out, Gothic literature can be
defined by the overwhelming use of “the.”) While it is seemingly missing a
filter, Voyant does allow the user to manipulate the data. I was able to select
as many top words as I wanted and see how their graphs compare. Each word’s
graph maps how often it was used in each reading. When comparing graphs of
certain words we can start noticing trends throughout the readings. As Mae
Capozzi noted in her blog “Reading at a Distance” having visuals like these
maps bring literary theory out of the phase of abstraction and gives it a
presence that is almost physical (Capozzi). No longer do theorists and critics have to
simply talk about their ideas; now they have visualizations.
When I manipulated my data to show me a graph of how the top
more important words were distributed throughout the readings, I got an
interesting result. The words I chose to map out were: reading, more, books,
topic, digital, humanities, literary, new, moretti, literature, words, and
distant. These were the top words excluding articles, pronouns, state-of-being
verbs, qualifiers, etc. If I had only seen this list of top words but never
read the articles and blogs, I would still be able to infer that distant
reading is a new development in the way we read literature that incorporates
the digital humanities. If I had a larger sample of texts, I would probably be
able to hone my understanding of distant reading down to an even more accurate
and precise definition.
My experience with Google Ngrams was a little different than
Voyant. I couldn’t find a way to enter all of the readings into that processor,
and if I could have, I would assume that the data would look pretty much the
same as it did in Voyant. Instead, I took the key concepts and entered them
into the search bar. Initially I chose to examine the use of the words “digital
humanities,” “distant reading,” and “big data” from 1950 (the decade of the
start of what we know as digital humanities) to 2008 (I tried to expand the
search to 2015, but every time I tried it was reset to 2008) in English. Ngram
allowed me to see the trends in usage for these words. All three started to
trend upward in the 90s – likely thanks to the internet. The term “digital
humanities” wasn’t used at all until the advent of the internet in the
nineties. I found it interesting that the term “big data” has been used since
the invention of the computer, as the computer gave people the opportunity to
archive and catalogue. “Big data” took a dramatic increase in usage between
1990 and 2000 when it peaked.
Using Ngram in this way also provided evidence for the pit
falls of distant reading. Firstly, in the graph I referenced above, the term
“distant reading” seems to have been frequently used around 1950. To see this
in greater detail, I extended my time frame to examine texts in English between
1900 and 2008. It seems that the term “distant reading” was relatively popular
between 1935 and 1955 (this can be seen in the graph below). Surely, the people using “distant reading” back then
were not referring to it in the way that we are today. However, the Ngram
processor doesn’t know this. If someone is analyzing information using Ngram,
they could very easily be tripped up by confused data.
I think that distant reading tools like Voyant and Ngram could
be great tools to use in the classroom especially when summing up a literary
era. Have students make predictions about a certain literary era before reading
texts: common themes, obstacles, world views, etc.. Then, have students read 5
– 10 texts or excerpts of texts from that era. They would be expected to close
read these texts. After their close reading, they would be asked to identify
what they believe would be commonalities throughout texts of the period. Once
the students are finished reading, have them enter a large sample of texts from
this period into one of these big data processors. By recognizing word
repetition, have the students reevaluate the common themes they predicted. Ask
them: how does distant reading compare to close reading? From which did they
gain the best understanding of the literary era? How do the two compliment each
other? Then, the students can enter some of the themes together into Google Ngram and see how they have evolved, connected, or opposed each other over
time. Both these tools could be wonderful resources in the classroom.
After reading about distant reading and having practice
putting it to use, I view it as a nice companion to traditional close reading.
As Joshua Rothman stated at the end of his article, “An Attempt to Discover the Laws of Literature,” “We can continue to read the old fashioned way. Moretti,
from afar, will tell us what he learns” (Rothman). I think that to analyze
literature solely from afar is to strip it of everything that makes it worthy
of analysis at all. However, this big data information that we are discovering
as we analyze large corpus of texts, can be useful in its ability to uncover
new trends in past periods of not only literature but also human cultures.
Readings Referenced:
Capozzi, Mae. “Reading from a Distance.” Blog. https://readingfromadistance.wordpress.com/
Cohen, Patricia. “Analyzing Literature by Words and Numbers.” The New York Times. 3 Dec. 2010. Web. 27 July 2015.
Rothman, Joshua. “An Attempt to Discover the Laws of Literature.” The New Yorker. 20 March 2014. Web. 27 July 2015.
Schulz, Kathryn. “What is Distant Reading?” The New York Times Sunday Book Review. 24 June 2011. Web. 27 July 2015.
Underwood, Ted. “How Not to Do Things with Words.” Blog. 25 Aug 2012. Web. 27 July 2015.
Would you want to major in English if we did distant reading instead of close reading in all of our classes? :-)
ReplyDelete