So I made this interactive visualization about the 5 Game of Thrones books. How?
The project
The visualization is based on the events which happen to the main characters of the books. With over 2000 characters and close to 5000 pages over 343 chapters, it’s not possible to show everything, so I took about 300 characters and restricted to a small selection of events, such as characters killing each other. Also, I regrouped characters in a 2-level hierarchy so that it would be easier to find them and see what happens at a higher level.
Data
Data is the first word in data visualization, and in order to visualize one must collect data.
this has not been a small task. When I read the books, which was a while back (before a Dance with Dragons was published), I had already half a mind to make a visualization, so I jotted down some notes but I had no clear idea of how it would look like. But I started writing down when in the books characters did die. Eventually I realized that if I wanted to prioritize characters I had to find a way to discriminate between those who appeared infrequently and could be left out, and those who were recurring. So, I had to find a way to determine when did the various characters appeared and what happened to them.
To achieve that I had the five books in printed version, which is definitely not the best way to approach this. So I tried to find something to scrape. So I approached this on two fronts. On one hand, I got a raw text version of the books. But they were very hard to scrape. For instance, there are at least 11 different characters named Pate (just Pate), and 23 called Jon something. Besides, many have aliases, titles and other names so a query to find all instances of “Jon” won’t capture all mentions of, say, Jon Snow, but will also return appearances of the Jon Conningtons, Jon Arryns and the like. To make matter worse, my text file was scanned from the book and was of less than optimal quality, with many typos on names.
The other source were the two fan-maintained ressources on the series, Tower of the Hand and a Wiki of Ice and Fire, which both contain summaries of the chapters and information on the characters. Some chapters were described in meticulous detail with all the characters that appear specified, and a description of all that happens then. But others are more loosely narrated. That said, both sites propose an exhaustive list of characters of the books which were extremely useful.
So I first scraped a wiki of Ice and Fire to know which characters were mentioned in each chapter, then read the summaries to get a feel from the events happening, which I maintainted by hand.
With that first level of material, I decided to keep characters mentioned at least 5 times, or the named character who had been killed by another named character (as opposed to “a guard” being killed by “a soldier”). That left me with about 250 characters (out of slightly over 2000). Later, when the visualization became usable, playing with it I found some inconsistencies – how come this character is not dead yet in that book? That was because some characters were missing from my roster. So by checking in the original books, I increased the roster to about 300 (296 precisely). Also for most (and not all) characters, using the text file, I was able to get all the mentions of a given character in all the books.
Data analysis
I wanted to do something around the relationships among characters and I soon noticed that there are many cliques, that is groups of characters where every one of them trust every other one. This is the case of most families of organizations. When there is one character that defects, this is clearly signalled. You never get a situation where A trusts B but not C, B trusts C and not A and C trusts A and not B, or anything complex really.
But still, that’s many, many groups.
While in the books, families and groups are presented as independent entities, they almost always align on a larger, more powerful one. So it was interesting to regroup the smaller groups in larger alliances, especially if the focus was to represent kills
In the books most characters belong to or serve noble houses, and those who don’t belong to well-identified groups. There are very few characters who just mind their own business. There is a plethora of such Houses which can make things confusing (and again: 2000 characters). After several attempts I concluded it’s neither possible nor a good idea to represent this diversity visually. Instead, I tried to “group the groups” and to create higher-level aggregates.
Eventually (and I did that fairly late in the process, after few tries on the visualization) I created 5 groups. One for the Starks and the Lannisters, which are the families which receive the most attention during the book, as 70% of the chapters are written from the point of view of a member of either family.Also, contrary to the Targaryen house whose point of view accounts for about 10% of the book, Starks and Lannisters have many allies and followers. So, as a consistent group they are larger and more interesting.
The other 3 groups are as follows: antagonists, that is aggressive characters (including monsters) who may attack any other; neutral characters, who tend to stay out of conflict, and opportunists, who look for more power.
Each of the 5 groups exhibits different patterns when it comes to killing: Starks (“the good guys”) don’t kill their own or neutral characters, but may have to fight characters from the other groups; conversely, some characters in the Lannister clan or among opportunists may carry out assassinations where anyone can be targeted. Neutral characters don’t fight except against antagonists, and the latter may fight characters from any group.
Drawing the visualization
I started thinking of that project a long time ago, and I’ve made experiments taking many forms. One of such form was a previous visualization on the places in Games of Thrones. That one visualization was the low-hanging fruit of the dataset I was building and refining. I knew I wanted to show events happening to the characters. Originally I thought of something linear, like a gantt chart, possibly grouping the characters by families which would be collapsable. But even in the broad sense that’s a lot of families, it wouldn’t make the visualization very legible.
What I had in mind then was to find a way to represent the status of the characters over time, who got killed, who got crippled, that sort of thing.
Eventually, I thought it was more interesting to represent the relationships of characters among themselves, so I started to take notice of all the interactions between characters, such as: who kills whom, who captures whom, who marries whom, etc. There were many which didn’t make it in the final visualization which is already complex enough as is.
I thought of the chord form early because it’s possible to use it to represent a lot of nodes and a lot of relationships among them even and even if it’s difficult to see one individual node expect the most important ones, and even more difficult to see one individual relationship, it’s possible to get a vague idea of mass. So I thought of representing characters as circles around a main circle coloring them by family or something. But doesn’t work, there are just too many different families. By so doing I was just plotting complexity.
Then, I realized that one very important aspect of the story, that is, one way in which a visualization could actually help understand what’s going on in the books, is that of trust. Within a group, all characters trust each other. Actually, this is much simpler than in real life: Westeros families are very close-knit; there are no murders among siblings or even though such things were commonplace in History! In network parlance, a group of entities which are all connected among each other is called a clique. And Game of Thrones is really a game of cliques. In all key moments of the book, one character of the clique will change sides. So all other characters of that clique continue to trust him, without realizing that he is setting them up, and a string of murders usually ensues.
So I decided to show action only at the clique level (families, organizations…). The problem I had was that once a character dies the representation of the clique won’t change much, whereas if I represented characters individually I could reflect that state of affairs.
So I thought of drawing one circle per clique around the main circle, and to represent characters individually within those circles using the packed circle method.
The method I chose was good (but not completely accurate) at preserving the relative importance of one clique compared to all the others, but just barely ok at preserving the relative importance of one given character.
I would take all the mentions of all the characters, tally that by clique, then take the square roots of that for each clique. Then, for each clique I compute the ratio of that square root to the sum of all the other square roots.
I multiply that by 2π and that’s an angle, that’s the “slice” of the main circle that will be occupied by the circle corresponding to that clique. Picture:
(btw click on the circle on the left to regenerate the data points)
So while those proportions don’t exactly match they are very very close. That doesn’t hold at the character level, because the sum of the areas of the character circles can occupy anywhere between 50% and 100% of the areas of the larger circle. But that’s not important. Accuracy is not important, as long as it is sufficient to say: this character appears often and this one doesn’t.
Two other technical points about the making of the viz.
All positions for all possible time periods had to be computed ahead of time.
In d3, it is natural to add, add, add stuff over time without worrying so much. More data? we’ll just add more datapoints.
Here I couldn’t really do that because I allowed the user to go back and forth in time. So a user could set the visualization in autoplay and go from time 0 to time 50, for instance, then pause and jump to time 200 and then back to time 25.
So it wasn’t possible to read the datafile in sequence and to draw some additional data points at each step. In the above exemple, all that happens between time 50 and time 200 has to be shown at once, and then all that happened between time 25 and time 200 has to be hidden at once.
so it’s just a matter of separating the code that calculates all the positions from the one that draws the viz, two operations which more often than not are intertwined.
Last, in the visualization I get to write group names in a circle around the main circle. How is this done?
Well, in svg, you can’t write “on a circle”. You can write on a path, which can be anything, a circular arc for instance. In this case it’s a bit more complicated because I wanted to make sure that the writing would not be upside down. So I actually used two arcs.
This is fantastic. Have you made your raw data set available publicly?
yes, all csv files are easily accessible from the source. The dataset is not documented though but pretty self-explanatory.