Computational text analysis (CTA) is an emerging field that uses computation to analyze texts. CTA draws on the fields of computer science, machine learning, computational linguistics, and literary theory. Using machine learning and statistics, computers can be used to explore how language is used in particular contexts, including how frequently different words are used, the sentiment of a word/text, as well as nuances in the ways words are associated with one another. We will use CTA to engage in "Distant Reading", a term coined by literary theorist Franco Moretti. Distant Reading stands in contrast to the more familiar "Close Reading": a deep engagement with a particular text or a passage from a text. Distant Reading engages not with a particular text, but with a large corpus of texts: e.g., all novels published in English in the 20th century, all articles written in The New York Times and The Washington Post in the last decade, or the lyrics of all top-100 pop songs from the 1980s. Computational techniques applied to large collections of texts allow one to ask broad questions about structural and linguistic change over time and to look for patterns of language use that would not be evident from analysis of one or even several individual texts. Distant Reading, and computational text analysis more generally, is not intended to replace close reading, but to complement it. We will use CTA to explore how power structures and systems such as race, gender, and colonialism manifest themselves in bodies of text.
Goals
- Stay physically and mentally healthy and maintain intellectual and personal connections and sense of community.
- Engage together in an experimental, interdisciplinary, co-learning experience.
- Leave the class thinking differently about the relations between language and power, and how language and race/gender/coloniality make and re-make each other.
- Have fun while learning a lot.
Outcomes
Students (and instructors) who successfully complete this course will
- Gain a conceptual understanding of various CTA techniques, including word frequency analysis, topic modeling, and sentiment analysis;
- Learn how to apply these techniques using pre-existing software and doing their own coding;
- Gain experience asking questions about power structures/systems---race, gender, colonialism---and how those structures manifest themselves in corpora of text;
- Learn how these questions of power can be explored using algorithmic methods; and
- Gain experience critiquing algorithmic methods through the lenses of race, gender, and colonialism.
The building in which we gather for this class, and all of College of the Atlantic, is located on traditional lands of the Wabanaki people. The Native American tribes in Maine today include the Abenaki, Maliseet, Micmac, Penobscot, and Passamaquoddy, collectively referred to as the Wabanaki. I believe it is important to acknowledge that our presence on this land entangles us in the web of colonialism, past and present.