The Great Gatsby: A Mixed Literary Analysis

I never liked that Tom Buchanan.

I’m a firm believer that readers can use computational methods to uncover new insights into literature. As I’ve written about elsewhere, the quantitative can inform the qualitative, and vice versa. This goes for enjoying literature in whatever way you choose, whether you are a library card-carrying bibliophile, a beach reader, or an English teacher. I love exploring ways to computationally read texts. Not because numbers and graphs are the end in and of themselves. Hardly. Reading is about human beings better understanding themselves and the world around them. I just happen to believe that computers can also help us do that. In a recent experiment, turned to one of the most taught novels in high schools across America: The Great Gatsby.

When I think about F. Scott Fitzgerald’s most successful novel, I immediately fixate on the interplay between the story’s characters, as well as the significance of the car as amoral independence made manifest. What happens when we invite software to read The Great Gatsby with us? What new questions do we ask? What lines do we find ourselves rereading?

To explore this further, we start with data. I wrote a program that took every word in the novel, identified every chapter in which it appears, and then tallied those appearances. Since I’m interested in characters and the car, I focused on those. Well, it looks like this:

Numbers alone aren’t very helpful. But with some simple data visualizations, we can start to analyze the data and surface new questions. Let’s start by looking at the narrator Nick Caraway and the mysterious Jay Gatsby.

Since the novel is in the first person, we don’t get a lot of Nick referring to himself. But that just makes the use of his name in the book all the more interesting. In the first chapter both names are used comparably, but then Gatsby’s name seems to fall off between the second and third chapters while Nick is still setting up the plot. Why does the main character’s name disappear like that? An intriguing question, but one I deferred.

Next, let’s look at Daisy and Tom, a relationship that always leaves me feeling disturbed when I read the book.

Tom’s domineering presence is quantitatively on display. Look at how the frequency of his name begins fairly equally with Daisy’s, which is a phenomenon we saw with Nick and Gatsby. (There’s mounting evidence that Fitzgerald introduces characters in this way. A rereading of Chapter 1 might prove it further.) But then, watch Tom overshadow his wife in Chapter 2. And yet, by Chapter 4, we start to see Daisy’s name usage in the novel increasing as she is finding her path out of the abusive relationship into the arms of Gatsby. But this is something Tom won’t allow.  As we look at her name usage in the last chapter, it appears that she and Tom are somehow in sad balance again.

Now for the story’s narrative epicenter: Gatsby and Daisy.

In all the times I read and taught this novel, I don’t think I ever noticed just how Fitzgerald introduces Daisy to readers, far outshining Gatsby’s own name in the first and second chapters. Looking at the data, I’m also struck by how much more frequently Daisy’s name is used in Chapter 7. Gatsby’s name surges as well, of course, as the novel reaches its climax. But Daisy’s name more so. Perhaps the reason for her surge will come into focus if we add Tom to this area chart.

Whoa. Look at the sheer quantitative dominance of Tom Buchanan in Chapter 7. What’s more, observe the way Fitzgerald introduces Tom in Chapters 1 and 2, his name nearly always uttered somewhere. But much like his behavior in the book, the frequency of his name usage builds and brews and then blows up. Relatedly, the thinness of his name’s use in Chapters 8 and 9 might suggests that he, like a ravenous beast that gorges on his kill, rests when he is done. And like a beast on the hunt, Tom uses whatever advantage he can to succeed. In this case, a car. Let’s look now how the word ‘car’ factors into the story.

As you can see, the word appears consistently throughout the novel, with its usage never being great but always uttered somewhere. Then, we observe just how correlated the car is to the love triangle between Gatsby, Daisy, and Tom. A line graph shows the relationship even more cleanly.

My computational reading of the novel draws me to this passage in Chapter 7 I hadn’t noticed originally. Fitzgerald writes:

There is no confusion like the confusion of a simple mind, and as we drove away Tom was feeling the hot whips of panic. His wife and his mistress, until an hour ago secure and inviolate, were slipping precipitately from his control. Instinct made him step on the accelerator with the double purpose of overtaking Daisy and leaving Wilson behind, and we sped along toward Astoria at fifty miles an hour, until, among the spidery girders of the elevated, we came in sight of the easy-going blue coupé.

Fitzgerald might say that it is Tom’s wife and mistress “slipping precipitately” from his control, but Tom too is slipping from his own control. He experiences “hot whips of panic.” In life, some gain control by slowing down, breathing deeply. Others speed things up, like a child trying to find balance on a two-wheeled bicycle. Tom does the latter. He thinks speed will restore his control over life. Tom might be “of a simple mind,” but don’t mistake that for someone who doesn’t want to know precisely that everything–and everyone–is in its proper place. And he will go to vulgar depths to ensure that order is restored.


I’d love to hear what you see in the data. What questions emerge for you? What parts of the book do the data lead you to? And what new insights emerge as you reread the text with the numbers in mind? 

