Over the course of the past week, I discovered something about myself. I am very bad at directionless, ludic interaction. I feel like Wendy Darling, having forgotten how to be young and play with Peter Pan. But there is it; I find it extraordinarily difficult to think about this text without some goal in mind, some way of imagining its usefulness (however broadly I defined that term). I had to believe I was dealing with the text in a potentially useful interpretive manner before I could think of anything to do to it. Once I got to dealing with the actual visualizations, however, I found “fun” came a bit more easily.
First, I take on the problem of the graph. The following graphs are the results of the Craig Zeta text analysis excel macro and what they show, briefly, is where sections of Daniel Deronda (coded by me as relating either to Deronda or Gwendolen) fall in relation to one another based on the words they use. The rest of this paragraph is skippable if you don’t actually care about the graph and want to skip to the visualizations. The X axis refers to the percentage of words in that section that are judged (by the macro itself) as being more relevant to Deronda, while the Y axis refers to the percentage of words in that section that are judged as being more relevant to Gwendolen. So a section that falls around .2 for Deronda and .05 for Gwendolen is probably about Deronda because almost 20% of the words it uses (and, I should note, it counts all the instances of words like “the” as one word) are Deronda words, what it calls marker words. Conveniently, it also provides a list of what those words it thought most relevant are. But we don’t care about the list. We care about the graphs. They follow and feel free to click for larger images.
Startling ugly and kinda useless, isn’t she? It looks like slides I remember viewing from 9th grade biology.
Well, it looks less like the mating dance of the hairball and you can see where the individual sections fall (and notice the weird stuff going on in the middle…it’s not actually that weird, by the way, those are just chapters where both Gwendolen and Deronda are the viewpoint characters). But it’s very pale and still not pretty.
This is what is known as good-enough graphing. It’s still not pretty, but it’s legible and the colors don’t clash, so I’d say we’re moving up in the world. The larger data points also make it easier to see clustering in the places that they overlap. It’s easier to see the broader shapes made by the sections and, while the graph doesn’t say much overall, it finally provides a decent macro-view of the division of the book.
Usefulness: Craig Zeta’s charts are mostly useful because, when they separate, you (meaning I) get some validation for your hypothesis. I assumed there would be specific words that show up more frequently around Deronda than Gwendolen and I was right. (Anyone who knows this novel should be able to think of about four off the top of her head.) But CZ gives me a list of 200 words that it found distinctive between the two and those words are good starting points for further exploration into the text.
Also, the actual clustering of the text is interesting. Why does Gwendolen appear to cluster more tightly than Deronda? Who is encroaching further on whose territory?
So those are the graphs. As I said, the process is useful but the actual visualizations are just there so that you have something to show for the process unless you’re using it for its intended purpose (trying to figure out which of two authors wrote a disputed text–whichever author’s cluster the disputed text falls into is probably the author). But using things beyond their intended purposes is fun, and I am trying to have fun, after all.
Another possibly useful analysis I came across was the Phrase Net. The Phrase Net is available as part of IBM’s wonderful Many Eyes web visualization to(ol | y) and it works by taking the plain text you put in and creating a network of words connected to one another in that text. You can define the parameters for connection, though the default is two words connected by “and”. Other options are “of the,” “the,” “a” and simply a space. The following phrase nets were made using the word “and” and I found certain elements in them interesting. Click on the links in the captions to go to Many Eyes and play with the originals.
So those are my phrase nets. They’re definitely an odd way of looking at a really large book, but I have to say what struck me the most about them was certain repeated usages of body parts. so let’s look at those sections more closely.
But, of course, the other interesting way to look at the groups is to look at words that appear in both, words which are similar in both and words that are Gwendolen or Deronda specific…at least in their usage involving the word “and”.
Words in red appear in both, words in purple are similar and words in blue only appear in one. I personally find the lips versus mouth dichotomy to be kinda cool.
So what have I learned so far from this project? Well, I learned that playfulness comes in plenty of forms and that having an original goal can actually be conducive to playing around with the results. I also learned that I can spend hours on Photoshop for no good reason. And I think…I hope…I’m getting a broader sense of how visualizations can change the way I think about a text…but more on that later.
In the spirit of ludic interaction, I offer one more visualization of our project.