I’m not usually a prolific tweeter. I tend to find between 1 and 5 interesting tweets each day (or every other week when I get distracted) and retweet them. But Tuesday, as those of you who follow me might have noticed, was an exception. I decided to try my hand at live tweeting the end-of-quarter project presentations in Alan Liu’s ENGL 236 class, “Introduction to the Digital Humanities”. The assignment: write up a detailed grant proposal for a Digital Humanities project and, if possible, provide a small prototype . The results were spectacular and I know I did not do them justice in 420 characters (I limited myself to three tweets per project – two for the presentation and one for the Q&A). But this is not a post about my first experience live tweeting, which was quite an experience and a really valuable exercise in attention and brevity. This is a post about the assignment itself and the kinds of ideas that it generated.
First, though, I should probably speak about my place in this class. I wasn’t in it. I wasn’t even officially auditing it. I just showed up every week because it was held in my* lab in between my office hours and because I was deeply curious what exactly an introduction to the digital humanities was. Additionally, as my lab responsibilities include holding office hours and providing support for those engaged in digital and new media projects, it seemed wise to remain abreast of what the class was interested in doing.
That meant, however, that while everyone else was gearing up to present their final projects, I was relaxing because there was no assigned reading for the week. In one sense, I did not actually have to be there. In another sense, this was the most important class of the term.
This was the class about imagining the future. This was the moment when my colleagues–many of whom probably still would not define DH as one of their fields–advanced proposals for projects that they thought were interesting, that they would find useful in their scholarly work and in whose creation they would like to participate.
What is so interesting about these projects is that they represent a microcosm of the kinds of projects humanist scholars would like to see available. If we make the assumption that we design imaginary projects that we wish exist for our research–a fair assumption, especially given how often the presenter related their project to dissertation work in progress–then these mock prospectuses become a window into what humanists would do with DH if they had “world enough and time.”
Obviously, this is not a representative sample, but it is an interesting starting point. What points of intersection appear in these projects? What elements of digital inquiry have been left out entirely? What kinds of things do my peers want to be able to do?
If you missed the tweets, I’ve Storified them here (or you can just check #engl236). If you would like to see the actual proposals rather than simply my summaries, they can be found at Project Prospectuses along with the full text of the assignment.
So here’s my take on the projects as a whole.
First, the people want databases. Eleven of the fourteen projects began with the creation and maintenance of a database. Often, they proposed a database of media, sometimes crowdsourced, where as many examples as feasible of that media would be located and available for comparison.
That was the second thing in common with nearly all of these database projects. Built in to the database itself were the tools necessary to sort, reorganize and analyze the data. This isn’t just about making it easy to track down the media the make up large scale analyses, it’s about making it easy to perform the analyses themselves.
For example, Percy proposed a database of legal documents with a built-in stop list that specializes in sorting out the excessively common legal terms that pepper court documents, but would be meaningless from a semantic standpoint. This kind of project makes it easy for someone with little legal training to go in and work with these texts. The “hard work” of figuring out how to cope with reams of legalese has already been done.
Here’s another example. Dalia and Nicole suggested a database of fairy tales called Digitales that aims to collect multiple versions of each fairy tale–both published and crowd-sourced versions in order to try and maintain a sense of transmission and orality–and includes tools that compare different versions of the same story as well as tools to compare the same figure across multiple tales. One could, I imagine, discover once and for all, “what does the fox say?” There are tools for this kind of analysis out there and similar kinds of databases as well. But a nontrivial amount of effort goes in to finding, cleaning and uploading the text…and then debugging the analyses. And, because all the systems in place to disseminate pre-cleaned texts are still invisible to the average scholar, this process is either repeated every time a new student wishes to study something or dismissed as too complex.** A project like this makes it easy to do research that, as of now, is still something of a pipe dream for most scholars.
Digitales will also include a timeline element so that the user can trace the evolution of a particular story over the ages. This is one of several projects (5, if I recall correctly) that are interested in spatializing and temporalizing knowledge. Nissa’s project, DIEGeo, aims to not only collect data on early 20th century, expatriated writers from the paper trails they leave behind, but also create an interactive timeline that displays which writers were where at what point in time. As with the fairy tale database, DIEGeo wants to literalize the way we “see” connections between authors. We can observe how the interwar authors move through time and space (without the use of a TARDIS), which opens up new avenues of charting influence and rethinking interactions.
Display, see, watch…these are are the verbs that make up these projects. I’ll throw in one more–look. These projects change the way we look at knowledge production. They prioritize organizing the texts and images and (meta)data that make up our cultural and textual artifacts in such a way that it becomes easy to ask new questions merely by looking at them. Because the preliminary research is already done (mapping all of French New Wave cinema in real and imaginary space, e.g.), it becomes possible to start asking larger scale questions that investigate more complex forms of interaction.
So here are the questions that humanists would be asking if the infrastructure was up to it. These are all projects that are buildable in theory (and, in Gabe’s case, in practice), but that require serious computational and infrastructural support. A lone scholar could never build one of these and, even having built it, afford to maintain it. But, these projects seem to say, just think of the critical and pedagogical opportunities that would arise if we had these databases at our disposal.
Now for the flip side of the question. What is absent?
With the very notable exception of Juan’s ToMMI (pronounced toe-me) tool for topic modeling images, there were no analytic tools proposed. Many of the databases incorporated already extant tools (and, in a larger sense of the term, one could argue that the database itself is a tool). Still, in retrospect, I’m surprised to see so few suggestions for text analysis tools or, even better, text preparation tools. Why?
And here’s the bit where I extrapolate from insufficient data. I think it’s harder to conceptualize and defend a tool than a database. Many of the text analysis tools already exist and why write an $80,000 grant proposal for something that someone else has already done?
On the other hand, how do you conceptualize a tool that hasn’t been invented yet? What would an all-in-one text prep tool look like?*** And would it even be possible to create one? And, even if you did, could you easily defend why it was interesting until you actually used it to produce knowledge? I can make an argument for the particular kind of knowledge that each of these projects creates/uncovers/teaches. But the tools that we need to make text analysis approachable are difficult to argue for because the argument comes down to “this makes text analysis easy and that will, hopefully, provide interesting data”.
As Jeremy Douglass, the head of Transcriptions, points out, many digital projects begin with the goal of answering critically inflected questions about the media they study and quickly become investigations into the logistics of building the project. This is, arguably, a feature rather than a bug. As Lindsay Thomas and Alan Liu pointed out at Patrik Svensson’s talk on “Big Digital Humanities”, our problem with data isn’t that it’s big, it’s that it’s messy. So, to apply Jeremy’s articulation of the situation in a way that hits close to home for me, the first question one must answer when transforming one or several novels into social network graphs is not “what patterns of interactions do we find?” but, “does staring at someone count as an interaction?” 19th century heroes do a lot of staring. Is that a different kind of interaction from speaking? Can and should I code for that? Will a computer be able to recognize this kind of interaction? Does that matter to me? At that point, two years might have gone by and one has an article about how to train a computer to recognize staring in novels, but has barely begun thinking about the interpretive moves one had planned to make regarding patterns of interactions. This is a critical step in thinking. It helps us answer questions we never even thought to ask. It changes the way we think about and approach texts. It forces us to stretch different muscles because technological (and sociological and economic) affordances matter and constraints, as the OuLiPo movement argues, may be necessary to do something innovative.
The downside is that we get caught up in answering the questions we know how to answer. Which is what is so fantastic about these project proposals and why I find them so compelling. They get to grapple with these problems without losing sight of why they do so. Corrigan dealt with this explicitly when presenting on MiRa, her mixed race film database. How, she asks, do we construct a database of mixed race when race itself is constructed? The project becomes a way of thinking about material and cultural constructions through the making of this database that is itself both a form of critical inquiry and an object of it.
I see all these proposals as the first steps in answering Johanna Drucker’s article in the most recent DHQ, where she offers suggestions towards “a performative approach to materiality and the design of an interpretative interface. Such an interface,” she argues, “supports acts of interpretation rather than simply returning selected results from a pre-existing data set. It should also be changed by acts of interpretation, and should morph and evolve. Performative materiality and interpretative interface should embody emergent qualities.”
Now all we need to do is get them built. But that, I think, is a task for another day. Winter break is about to begin.
*For a given definition of the term.
**I will easily grant that there are a number of problems with making texts that have been chunked, lemmatized, stripped of all verbs, de-named, etc. available and that doing so will open them up to misuse. I also think that the idea of a TextHub based off of Github (or even using Github for out of copyright materials) where different forms of text preparation are forked off of the original and clearly documented should be embraced by the DH community.
***I may be showing my hand here, but I really want one of these.