A Game of Thrones tutorial¶

Game of Thrones (GoT) is an American fantasy drama TV series, created by D. Benioff and D.B. Weiss for the American television network HBO. It is the screen adaption of the series of fantasy novels A Song of Ice and Fire, written by George R.R. Martin. The series premiered on HBO in the United States on April 17, 2011, and concluded on May 19, 2019, with 73 episodes broadcast over eight seasons. With its 12 million viewers during season 8 and a plethora of awards---according to Wikipedia, Game of Thrones has attracted record viewership on HBO and has a broad, active, and international fan base.

The intricate world narrated by George R.R. Martin and scripted by Benioff and Weiss has stimulated the curiosity of ranks of scientists, delighted by the opportunity to study complex social phenomena. In this notebook, we delve into the study of GoT relationships to discover what the hypergraphs they generate reveal about the story.

In this notebook, we replicate some of the analysis you can read in our paper at this link!

What we need... installing and loading packages¶

In [46]:
using Pkg
In [47]:
#pkg"add PyCall Conda SimpleHypergraphs PyPlot GraphPlot ColorSchemes"

Prerequisites for plotting¶

In [1]:
using PyCall
using Conda
In [49]:
#Conda.runconda(`install matplotlib --yes`)
#Conda.runconda(`install networkx --yes`)
#run(`$(PyCall.python) -m pip install hypernetx==1.2.5`)
In [2]:
using SimpleHypergraphs
using Graphs
using PyPlot
using GraphPlot, ColorSchemes

The data set¶

This study is based on the dataset at the GitHub repository of Jeffrey Lancaster Game of Thrones Datasets and Visualizations. We will thus focus on the GoT TV series.

We studied GoT characters' co-occurrences with different levels of granularity. We modeled the GoT data set building three different types of hypergraphs, each one reporting whether the GoT characters have appeared in the same season, in the same episode, or in the same scene together.

Hypergraph with each season as an edge¶

First, we load a hypergraph studying characters' co-occurences within seasons. Here, the hyperedges are the GoT seasons and the characters who appear in each eason are the nodes.

In [3]:
pth = joinpath(dirname(pathof(SimpleHypergraphs)),"..","tutorials", "basics", "data", "hg_seasons_all.json");
In [4]:
h = SimpleHypergraphs.hg_load(pth; format=JSON_Format(), T=Bool, V=Symbol, E=Symbol);
In [5]:
# how many characters did we see during the overall TV series?
size(h)[1]
Out[5]:
577
In [6]:
# how many characters does each season have?
# we can ask this way...
map(he -> println("Season $he has $(length(getvertices(h, he))) characters"), 1:nhe(h));
Season 1 has 125 characters
Season 2 has 137 characters
Season 3 has 137 characters
Season 4 has 152 characters
Season 5 has 175 characters
Season 6 has 208 characters
Season 7 has 75 characters
Season 8 has 66 characters
In [7]:
# ... or this way
length.(h.he2v)
Out[7]:
8-element Vector{Int64}:
 125
 137
 137
 152
 175
 208
  75
  66

SimpleHypergraphs integates the Python library HyperNetX to let the user visualize a hypergraph h exploiting an Euler-diagram visualization. For more details, please refer to the library HyperNetX.

In [8]:
pth = joinpath(dirname(pathof(SimpleHypergraphs)),"..","tutorials", "basics", "data", "hg_seasons_min.json");
In [9]:
# Let's visualize (a smaller parte of) the hypergraph we built
# To build this smaller hypergraph, we considered only those characters 
# appearing at least in 10 scenes in the whole series
h1 = SimpleHypergraphs.hg_load(pth; format=JSON_Format(), T=Int, V=Symbol, E=Symbol);
In [10]:
map(he -> 
    println("Season $he has *$(length(getvertices(h1, he)))* characters appearing in at least 10 scenes"), 
    1:nhe(h));
Season 1 has *69* characters appearing in at least 10 scenes
Season 2 has *74* characters appearing in at least 10 scenes
Season 3 has *83* characters appearing in at least 10 scenes
Season 4 has *83* characters appearing in at least 10 scenes
Season 5 has *75* characters appearing in at least 10 scenes
Season 6 has *98* characters appearing in at least 10 scenes
Season 7 has *56* characters appearing in at least 10 scenes
Season 8 has *44* characters appearing in at least 10 scenes
In [11]:
length.(h1.he2v)
Out[11]:
8-element Vector{Int64}:
 69
 74
 83
 83
 75
 98
 56
 44
In [12]:
# viz params: edge labels
edge_labels = Dict{Int, String}(map(x -> x=>"S$x", 1:nhe(h)))
edge_labels_kwargs = Dict{String,Any}("fontsize" => "x-large")
;
In [13]:
SimpleHypergraphs.draw(
    h1, 
    HyperNetX;
    no_border = true,
    width = 7,
    height = 7,
    collapse_nodes = true, 
    with_node_counts = true, 
    with_node_labels = true,
    edge_labels = edge_labels, 
    edge_labels_kwargs = edge_labels_kwargs
)
In [14]:
# who are the characters appearing in all 8 seasons?
most_important_character_ids = findall(x->x==1, (length.(h.v2he) .== 8))

for id in most_important_character_ids
    println(SimpleHypergraphs.get_vertex_meta(h, id))
end
White_Walker
Jon_Snow
Sansa_Stark
Arya_Stark
Theon_Greyjoy
Cersei_Lannister
Jaime_Lannister
Tyrion_Lannister
Daenerys_Targaryen
Jorah_Mormont
Drogon
Rhaegal
Viserion
Lord_Varys
Samwell_Tarly
Bronn

Hypergraph with each scene as an edge¶

In [15]:
# Let's have a closer look of what's happening in season 1
pth = joinpath(dirname(pathof(SimpleHypergraphs)),"..","tutorials", "basics", "data", "hg_season1.json");
hg = SimpleHypergraphs.hg_load(pth; format=JSON_Format(), T=Bool, V=Symbol, E=Symbol);
In [16]:
# how many characters do we have? How many scenes?
"$(nhv(hg)) characters and $(nhe(hg)) scenes"
Out[16]:
"125 characters and 286 scenes"

The collaboration structure of Game of Thrones.¶

At this point, we had an overview about how many characters appeared over the whole GoT TV series and which one of them made it till the end.

Let's find out how these characters interacted with each other in season 1. To gather insights within these complex relationships, we run a community detection algorithm on the hypergraph built considering scenes as hyperedges.

Running this algorithm, we expect to find coherent plotlines and, therefore, groups of characters frequently appearing in a scene together.

In [17]:
# Let's assure to have a single connected component
# Otherwise, we pick the largest one
cc = get_connected_components(hg)
Out[17]:
2-element Vector{Vector{Int64}}:
 [1, 5, 3, 4, 2, 16, 12, 29, 34, 49  …  37, 87, 48, 44, 40, 106, 93, 101, 118, 119]
 [99]
In [18]:
remove_vertex!(hg, 99)
cc = get_connected_components(hg)
Out[18]:
1-element Vector{Vector{Int64}}:
 [1, 5, 3, 4, 2, 16, 12, 29, 34, 49  …  37, 87, 48, 44, 40, 106, 93, 101, 118, 119]
In [19]:
# We used the label propagation (LP) algorithm
cflp = CFLabelPropagationFinder(100,1234)

comms = findcommunities(hg, cflp)

"We found $(length(comms.np)) communities in the hypergraph of the 1st season."
Out[19]:
"We found 18 communities in the hypergraph of the 1st season."
In [20]:
length.(comms.np)
Out[20]:
18-element Vector{Int64}:
  1
  1
  4
 37
  1
  1
  1
 18
  3
  1
 16
  4
 13
  7
 12
  1
  1
  2
In [21]:
# Who are they?
for c in comms.np
    for character in c
       println(get_vertex_meta(hg, character)) 
    end
    println("-----")
end
White_Walker
-----
Ros
-----
Shaggydog
Nymeria
Grey_Wind
Lady
-----
Loras_Tyrell
Renly_Baratheon
Grand_Maester_Pycelle
Septa_Mordane
Old_Nan
Mycah
Hugh_of_the_Vale
Sansa_Stark
Hodor
Joffrey_Baratheon
Gregor_Clegane
Jon_Arryn
Jory_Cassel
Royal_Steward
Myrcella_Baratheon
Benjen_Stark
Rickon_Stark
Varly
Eddard_Stark
Robert_Baratheon
Beric_Dondarrion
Lancel_Lannister
Ilyn_Payne
Sandor_Clegane
Tommen_Baratheon
Cersei_Lannister
Joss
Janos_Slynt
Yoren
Petyr_Baelish
Arya_Stark
Lord_Varys
Barristan_Selmy
Summer
Tomard
Jaime_Lannister
Meryn_Trant
-----
Mhaegen
-----
Armeca
-----
Tobho_Mott
-----
Mirri_Maz_Duur
Viserys_Targaryen
Irri
Drogon
Viserion
Daenerys_Targaryen
Rhaegal
Mago
Khal_Drogo
Wine_Merchant
Doreah
Qotho
Jhiqui
Rakharo
Illyrio_Mopatis
Rhaego
Jorah_Mormont
Dothraki_Crone
-----
Osha
Wallen
Stiv
-----
Three-Eyed_Raven
-----
Vardis_Egen
Addam_Marbrand
Leo_Lefford
Chella
Robin_Arryn
Lannister_Messenger
Shae
Mord
Lysa_Arryn
Kevan_Lannister
Bronn
Timett
Marillion
Tyrion_Lannister
Shagga
Tywin_Lannister
-----
Walder_Frey
Joyeuse_Erenford
Stevron_Frey
Ryger_Rivers
-----
Alliser_Thorne
Ghost
Rast
Jon_Snow
Maester_Aemon
Jaremy_Rykker
Pypar
Othell_Yarwyck
Jafer_Flowers
Othor
Jeor_Mormont
Samwell_Tarly
Grenn
-----
Hot_Pie
Syrio_Forel
Lommy_Greenhands
Gendry
King's_Landing_Baker
Street_Urchin
Red_Keep_Stableboy
-----
Robb_Stark
Jonos_Bracken
Theon_Greyjoy
Will
Lannister_Scout
Bran_Stark
Greatjon_Umber
Catspaw_Assassin
Rodrik_Cassel
Catelyn_Stark
Maester_Luwin
Rickard_Karstark
-----
Wight_Wildling_Girl
-----
Little_Bird
-----
Waymar_Royce
Gared
-----

Some comments¶

Based on the communities above, we can say that the LP algorithm ran on such hypergraph revealed:

  • The presence of minor communities of characters, appearing only in few scenes of the whole season. It is interesting to note that the algorithm correctly identifies background characters that do heavily not contribute to the main storyline (for now).

    • (Syrio_Forel);
    • (Waymar_Royce, White_Walker, Gared);
    • (Three-Eyed_Raven);
    • (Hot_Pie, Red_Keep_Stableboy, Vayon_Poole, Gendry);
    • ...
  • The other communities embody the different sub-plotlines happening in the first season. We can list:

    • two communities related to Daenerys_Targaryen and the Dothraki;
    • one community related to the events happening in Castle Black and related to Jon Snow;
    • one community related to the Houses Arryn and Frey.
  • The last more significant community contains the majority of the characters appearing in the first season. This result makes sense as all these characters have been forced to stay together at the Red Keep. Thus, they appear in many scenes together.

How good these communities are in terms of modularity?¶

In [22]:
modularity(hg, comms.np)
Out[22]:
0.5040026252111678

Find communities by maximizing the value of modularity¶

In [23]:
# Here, we used a CNM-Like algorithm for finding communities.
cnm = CFModularityCNMLike(500)

cnm_comms = findcommunities(hg, cnm)

"We found $(length(cnm_comms.bp)) communities in the hypergraph of the 1st season with a modularity value of $(cnm_comms.bm)."
Out[23]:
"We found 16 communities in the hypergraph of the 1st season with a modularity value of 0.5121486201080864."
In [24]:
# How many characters are there in each community?

# size of each community
comms_size = length.(cnm_comms.bp)

# how many community of that size do exist?
values = unique(length.(cnm_comms.bp))
data = Dict([(i, count(x->x==i, comms_size)) for i in values])
Out[24]:
Dict{Int64, Int64} with 5 entries:
  5  => 1
  12 => 1
  76 => 1
  1  => 12
  19 => 1
In [25]:
# Who are they?
for c in cnm_comms.bp
    for character in c
       println(get_vertex_meta(hg, character)) 
    end
    println("-----")
end
White_Walker
Wight_Wildling_Girl
Waymar_Royce
Will
Gared
-----
Khal_Drogo
Wine_Merchant
Mirri_Maz_Duur
Doreah
Viserys_Targaryen
Qotho
Jhiqui
Little_Bird
Irri
Rakharo
Drogon
Viserion
Daenerys_Targaryen
Rhaegal
Illyrio_Mopatis
Rhaego
Jorah_Mormont
Dothraki_Crone
Mago
-----
Renly_Baratheon
Grand_Maester_Pycelle
Nymeria
Ros
Joyeuse_Erenford
Septa_Mordane
Alliser_Thorne
Hodor
Lady
Myrcella_Baratheon
Rast
Jon_Snow
Maester_Aemon
Eddard_Stark
Robert_Baratheon
Stevron_Frey
Lannister_Scout
Lancel_Lannister
Pypar
Sandor_Clegane
Greatjon_Umber
Joss
Petyr_Baelish
Othor
Samwell_Tarly
Arya_Stark
Summer
Lord_Varys
Barristan_Selmy
Wallen
Grenn
Red_Keep_Stableboy
Old_Nan
Mycah
Hugh_of_the_Vale
Sansa_Stark
Jon_Arryn
Joffrey_Baratheon
Robb_Stark
Jory_Cassel
Gregor_Clegane
Ghost
Royal_Steward
Shaggydog
Mhaegen
Armeca
Rickon_Stark
Syrio_Forel
Three-Eyed_Raven
Varly
Stiv
Beric_Dondarrion
Theon_Greyjoy
King's_Landing_Baker
Street_Urchin
Jaremy_Rykker
Ilyn_Payne
Osha
Bran_Stark
Cersei_Lannister
Othell_Yarwyck
Tommen_Baratheon
Janos_Slynt
Yoren
Marillion
Jafer_Flowers
Jeor_Mormont
Catspaw_Assassin
Walder_Frey
Rodrik_Cassel
Maester_Luwin
Grey_Wind
Jaime_Lannister
Catelyn_Stark
Ryger_Rivers
Meryn_Trant
-----
Benjen_Stark
-----
Lommy_Greenhands
-----
Lysa_Arryn
-----
Hot_Pie
-----
Addam_Marbrand
Leo_Lefford
Bronn
Chella
Timett
Lannister_Messenger
Tyrion_Lannister
Shae
Shagga
Tywin_Lannister
Mord
Kevan_Lannister
-----
Tobho_Mott
-----
Gendry
-----
Tomard
-----
Vardis_Egen
-----
Robin_Arryn
-----
Jonos_Bracken
-----
Rickard_Karstark
-----
Loras_Tyrell
-----

Let's visualize these communities¶

We will visualize the obtained communities through a two-section representation of the hypergraph.

In [26]:
# As a TwoSectionView can be instantiated only for hypergraphs with non-overlapping edges,
# here we will use Graphs.jl directly.
m = get_twosection_adjacency_mx(hg;replace_weights=1)

t = Graph(m)
Out[26]:
{124, 886} undirected simple Int64 graph

Just a few viz params¶

In [29]:
my_colors = vcat(ColorSchemes.rainbow[range(1, stop=length(ColorSchemes.rainbow), step=2)], ColorSchemes.rainbow[2]);
In [30]:
function get_color(i, comms, colors)
    for j in 1:length(comms)
        if i in comms[j]
            return colors[j % length(colors) + 1]
        end
    end
    return "black"
end;
In [31]:
degrees = [Graphs.degree(t, v) for v in Graphs.vertices(t)];

dsize = fill(1.3, 124)
dsize += degrees/maximum(degrees);
In [32]:
# evaluate comms labels
function get_labels(comms)
    labels = fill("", 124)
    index = 1

    for c in comms
        labels[first(c)] = "C$(index)"
        index += 1
    end
    
    labels
end

labels = get_labels(comms.np)
cnm_labels = get_labels(cnm_comms.bp);

Communities from Label Propagation¶

In [33]:
gplot(
    t,
    nodesize = dsize,
    nodelabel = labels,
    nodelabeldist=2.5,
    #nodelabelsize=size,
    nodefillc=get_color.(1:Graphs.nv(t), Ref(comms.np), Ref(reverse(my_colors)))
)
Out[33]:
C18 C16 C1 C15 C3 C2 C13 C10 C7 C4 C11 C5 C9 C6 C17 C8 C12 C14

Communities from CNM-like¶

In [34]:
gplot(
    t, 
    nodesize = dsize,
    nodelabel = cnm_labels,
    nodelabeldist = 5.5,
    nodelabelsize = dsize,
    nodefillc = get_color.(1:Graphs.nv(t), Ref(cnm_comms.bp), Ref(my_colors))
)
Out[34]:
C1 C2 C4 C3 C10 C9 C16 C12 C6 C13 C11 C5 C15 C14 C8 C7

Which are the most important characters?¶

Identifying truly important and influential characters in a vast narrative like GoT may not be a trivial task, as it depends on the considered level of granularity. In these cases, the main character(s) in each plotline is referred with the term fractal protagonist(s), to indicate that the answer to "who is the protagonist" depends on the specific plotline.

Degree centrality¶

Who are the characters that apper in the majority of the scenes?

In [35]:
degrees = Dict{Int, Int}()

for v=1:nhv(hg)
    degrees[v] = length(gethyperedges(hg, v))
end

sorted_degrees = sort(collect(degrees), by=x->x[2], rev=true);
In [36]:
# Let's plot these data
characters = Array{String, 1}()
degrees = Array{Int, 1}()

max_c = 0

# we will visualize only characters appearing in at least 15 scenes
for c in sorted_degrees
    max_c = max_c > 15 ? break : max_c + 1

    push!(characters, string(get_vertex_meta(hg, c.first)))
    push!(degrees, c.second)
end
In [37]:
pos = collect(1:length(characters))

fig = plt.figure(figsize=(6, 4)) 
ax = fig.add_subplot(111)

rects = ax.barh(
    pos,
    degrees,
    align="center",
    tick_label=characters
)

#plt.tight_layout(.5)
plt.tick_params(labelsize="large")

plt.title("Season 1", pad=10, fontweight="semibold", fontsize="x-large")

plt.tight_layout()

Betweenness centrality¶

We investigated the importance of the characters also evaluating the betweenness centrality (BC) metric of hypergraph nodes. BC measures the centrality of a node by computing the number of times that a node acts as a bridge between the other two nodes, considering the shortest paths between them.

Here, we used the concept of s-beetwennes-centrality. Check out the paper for more detail about this metric.

In [38]:
# Here we evaluate betweennes value considering 1-paths, 2-paths, and 3-paths.
betweeness = Dict{Int, Dict{Symbol, Float64}}()

for s=1:3
    A = SimpleHypergraphs.adjacency_matrix(hg; s=s)
    G = Graphs.SimpleGraph(A)
    bc = Graphs.betweenness_centrality(G)

    for v=1:nhv(hg)
        push!(
            get!(betweeness, s, Dict{Symbol, Int}()),
            get_vertex_meta(hg, v) => bc[v]
        )
    end
end
In [39]:
sorted_betweeness = Dict{Int, Any}()

for s=1:3
    d = get!(betweeness, s, Dict{Symbol, Int}())
    d_sorted = sort(collect(d), by=x->x[2], rev=true)

    push!(
        sorted_betweeness,
        s => d_sorted
    )

end
In [40]:
sorted_betweeness
Out[40]:
Dict{Int64, Any} with 3 entries:
  2 => [:Tyrion_Lannister=>0.0842826, :Jon_Snow=>0.070162, :Eddard_Stark=>0.054…
  3 => [:Jon_Snow=>0.056747, :Eddard_Stark=>0.0513413, :Tyrion_Lannister=>0.048…
  1 => [:Arya_Stark=>0.271413, :Illyrio_Mopatis=>0.251899, :Tyrion_Lannister=>0…
In [41]:
# Getting top 10 characters for each s-value
#character => (HG_degree, G_degree)
data = Dict{Symbol, Array{Float64, 1}}()

for s=1:3
    higher_degree_characters = sorted_betweeness[s][1:10]

    for elem in higher_degree_characters
        if !haskey(data, elem.first)
            if s==1
                push!(
                    get!(data, elem.first, Array{Float64, 1}()),
                    elem.second,
                    betweeness[2][elem.first],
                    betweeness[3][elem.first]
                )
            elseif s==2
                push!(
                    get!(data, elem.first, Array{Float64, 1}()),
                    betweeness[1][elem.first],
                    elem.second,
                    betweeness[3][elem.first]
                )
            else
                push!(
                    get!(data, elem.first, Array{Float64, 1}()),
                    betweeness[1][elem.first],
                    betweeness[2][elem.first],
                    elem.second
                )
            end
        end
    end
end

#by highest betweenes degree in the graph, s=1
sorted_data = sort(collect(data), by=x->x[2][1], rev=true)
Out[41]:
13-element Vector{Pair{Symbol, Vector{Float64}}}:
       :Arya_Stark => [0.27141303430561975, 0.020811742624320874, 0.025567689757446024]
  :Illyrio_Mopatis => [0.25189924030387845, 0.0, 0.0]
 :Tyrion_Lannister => [0.15323433877389522, 0.08428255933571452, 0.048736052575128745]
     :Eddard_Stark => [0.11451260512166041, 0.05462031924487055, 0.051341324021137794]
    :Catelyn_Stark => [0.11139671344052993, 0.029487862630240228, 0.033451404486765125]
         :Jon_Snow => [0.11122931698056152, 0.07016200324650089, 0.05674696356665995]
             :Will => [0.06384113021458084, 0.03825136612021858, 0.0]
       :Lord_Varys => [0.06377540542298163, 0.0009193736202932523, 0.001578733585930707]
    :Rodrik_Cassel => [0.05228209175473733, 0.023788449409135447, 0.017966399768193867]
       :Bran_Stark => [0.042489406954219586, 0.0441890894890005, 0.029071370024299194]
    :Theon_Greyjoy => [0.038163554220999756, 0.030162976584305612, 0.023194820979115627]
 :Robert_Baratheon => [0.01766998385978779, 0.014594012937620571, 0.014715908661845302]
      :Sansa_Stark => [0.015381909177319377, 0.020828998970610812, 0.012916363931724326]
In [42]:
labels = Array{String, 1}()
s1 = Array{Float64, 1}()
s2 = Array{Float64, 1}()
s3 = Array{Float64, 1}()

for elem in sorted_data
    push!(labels, string(elem.first))
    push!(s1, elem.second[1])
    push!(s2, elem.second[2])
    push!(s3, elem.second[3])
end

clf()
fig = plt.figure(figsize=(7.5, 4.5))
ax = plt.axes([0.12, 0.32, 0.85, 0.6])

pos = collect(1:length(labels))# the x locations for the groups
width = 0.3  # the width of the bars

s1_rects = ax.bar(pos .- width/2, s1, width, label=L"$1-path$")
s2_rects = ax.bar(pos .+ width/2, s2, width, label=L"$2-path$")
s3_rects = ax.bar(pos .+ (width+width/2), s2, width, label=L"$3-path$")

# Add some text for labels, title and custom x-axis tick labels, etc.
plt.title("Season 8", pad=10, fontweight="semibold", fontsize="x-large")

plt.ylabel("Betweeness Centrality Score", fontweight="semibold", fontsize="x-large", labelpad=10)

ax.set_xticks(pos)
ax.set_xticklabels(labels)

ax.set_yticks(collect(range(0, stop=(.40 > maximum(s1) ? .4 : maximum(s1)), step=0.05))) #(season == 8 ? 25 : 10)

ax.legend()

plt.setp(ax.xaxis.get_majorticklabels(), rotation=65, ha="right", rotation_mode="anchor")
plt.tick_params(labelsize="large")

Some comments¶

The plot above, showing the betweenness centrality scores using (1,2,3)-paths, reveals that removing "shallow" relationships can bring to light a completely different insight about the structure of the network we are analyzing. Specifically, in this case, it is worth noting that:

  • Illyrio_Mopatis and Lord_Varys appear among the ten most important characters. However, they disappear when using (2,3)-paths, suggesting that these characters did not contribute in an evident way to the plot (but maybe behind the scenes);
  • Arya_Stark appear to be the most critical character when considering 1-paths. Nonetheless, her importance decreases when switching to (2,3)-paths.

This example highlight how just using 2-paths, we can grasp how characters really interact in the plot and which are the guys who tie all characters together.

In [ ]: