Download this tutorial as a PDF file.

Download the example datasets and R code this tutorial uses.



1 Introduction: network visualization

The main concern in designing a network visualization is the purpose it has to serve. What are the structural properties that we want to highlight? What are the key concerns we want to address?



Network maps are far from the only visualization available for graphs - other network representation formats, and even simple charts of key characteristics, may be more appropriate in some cases.



In network maps, as in other visualization formats, we have several key elements that control the outcome. The major ones are color, size, shape, and position.



Modern graph layouts are optimized for speed and aesthetics. In particular, they seek to minimize overlaps and edge crossing, and ensure similar edge length across the graph.





You can download all workshop materials here, or visit kateto.net/sunbelt2019.

This tutorial uses several key packages that you will need to install in order to follow along. Other packages will be mentioned along the way, but those are not critical and can be skipped.

The main packages we are going to use are igraph (maintained by Gabor Csardi and Tamas Nepusz), sna & network (maintained by Carter Butts and the Statnet team), ggraph(maintained by Thomas Lin Pederson), visNetwork (maintained by Benoit Thieurmel), threejs (maintained by Bryan W. Lewis), NetworkD3 (maintained by Christopher Gandrud), and ndtv (maintained by Skye Bender-deMoll).

   

install.packages("igraph") 
install.packages("network") 
install.packages("sna")
install.packages("ggraph")
install.packages("visNetwork")
install.packages("threejs")
install.packages("networkD3")
install.packages("ndtv")

           


       

2 Colors in R plots

Colors are pretty, but more importantly, they help people differentiate between types of objects or levels of an attribute. In most R functions, you can use named colors, hex, or RGB values.

In the simple base R plot chart below, x and y are the point coordinates, pch is the point symbol shape, cex is the point size, and col is the color. To see the parameters for plotting in base R, check out ?par.

plot(x=1:10, y=rep(5,10), pch=19, cex=3, col="dark red")
points(x=1:10, y=rep(6, 10), pch=19, cex=3, col="557799")
points(x=1:10, y=rep(4, 10), pch=19, cex=3, col=rgb(.25, .5, .3))


You may notice that RGB here ranges from 0 to 1. While this is the R default, you can also set it to the 0-255 range using something like rgb(10, 100, 100, maxColorValue=255).

We can set the opacity/transparency of an element using the parameter alpha (range 0-1):

plot(x=1:5, y=rep(5,5), pch=19, cex=12, col=rgb(.25, .5, .3, alpha=.5), xlim=c(0,6))  


If we have a hex color representation, we can set the transparency alpha using adjustcolor from package grDevices. For fun, let’s also set the plot background to gray using the par() function for graphical parameters. We won’t do that below, but we could set the margins of the plot with par(mar=c(bottom, left, top, right)), or tell R not to clear the previous plot before adding a new one with par(new=TRUE).

par(bg="gray40")
col.tr <- grDevices::adjustcolor("557799", alpha=0.7)
plot(x=1:5, y=rep(5,5), pch=19, cex=12, col=col.tr, xlim=c(0,6)) 


If you plan on using the built-in color names, here’s how to list all of them:

colors()                          # List all named colors
grep("blue", colors(), value=T)   # Colors that have "blue" in the name


In many cases, we need a number of contrasting colors, or multiple shades of a color. R comes with some predefined palette function that can generate those for us. For example:

pal1 <- heat.colors(5, alpha=1)   #  5 colors from the heat palette, opaque
pal2 <- rainbow(5, alpha=.5)      #  5 colors from the heat palette, transparent
plot(x=1:10, y=1:10, pch=19, cex=5, col=pal1)


plot(x=1:10, y=1:10, pch=19, cex=5, col=pal2)


We can also generate our own gradients using colorRampPalette. Note that colorRampPalette returns a function that we can use to generate as many colors from that palette as we need.

palf <- colorRampPalette(c("gray80", "dark red")) 
plot(x=10:1, y=1:10, pch=19, cex=5, col=palf(10)) 


To add transparency to colorRampPalette, you need to use a parameter alpha=TRUE:

palf <- colorRampPalette(c(rgb(1,1,1, .2),rgb(.8,0,0, .7)), alpha=TRUE)
plot(x=10:1, y=1:10, pch=19, cex=5, col=palf(10)) 

Finding good color combinations is a tough task - and the built-in R palettes are rather limited. Thankfully there are other available packages for this:

# If you don't have R ColorBrewer already, you will need to install it:
install.packages('RColorBrewer')
library('RColorBrewer')
display.brewer.all()


This package has one main function, called brewer.pal. To use it, you just need to select the desired palette and a number of colors. Let’s take a look at some of the RColorBrewer palettes:

display.brewer.pal(8, "Set3")

display.brewer.pal(8, "Spectral")

display.brewer.pal(8, "Blues")

Using RColorBrewer palettes in plots:

pal3 <- brewer.pal(10, "Set3")
plot(x=10:1, y=10:1, pch=19, cex=6, col=pal3)


plot(x=10:1, y=10:1, pch=19, cex=6, col=rev(pal3)) # backwards

 




3 Data format, size, and preparation

In this tutorial, we will work primarily with two small example data sets. Both contain data about media organizations. One involves a network of hyperlinks and mentions among news sources. The second is a network of links between media venues and consumers.

While the example data used here is small, many of the ideas behind the visualizations we will generate apply to medium and large-scale networks. This is also the reason why we will rarely use certain visual properties such as the shape of the node symbols: those are impossible to distinguish in larger graph maps. In fact, when drawing very big networks we may even want to hide the network edges, and focus on identifying and visualizing communities of nodes.

At this point, the size of the networks you can visualize in R is limited mainly by the RAM of your machine. One thing to emphasize though is that in many cases, visualizing larger networks as giant hairballs is less helpful than providing charts that show key characteristics of the graph.

   

3.1 DATASET 1: edgelist

The first data set we are going to work with consists of two files, “Dataset1-Media-Example-NODES.csv” and “Dataset1-Media-Example-EDGES.csv” (download here).

nodes <- read.csv("Dataset1-Media-Example-NODES.csv", header=T, as.is=T)
links <- read.csv("Dataset1-Media-Example-EDGES.csv", header=T, as.is=T)

Examine the data:

head(nodes)
head(links)

3.2 Creating an igraph object

Next we will convert the raw data to an igraph network object. To do that, we will use the graph_from_data_frame() function, which takes two data frames: d and vertices.

  • d describes the edges of the network. Its first two columns are the IDs of the source and the target node for each edge. The following columns are edge attributes (weight, type, label, or anything else).
  • vertices starts with a column of node IDs. Any following columns are interpreted as node attributes.
library('igraph')
net <- graph_from_data_frame(d=links, vertices=nodes, directed=T) 
net
## IGRAPH 5151461 DNW- 17 49 -- 
## + attr: name (v/c), media (v/c), media.type (v/n), type.label
## | (v/c), audience.size (v/n), type (e/c), weight (e/n)
## + edges from 5151461 (vertex names):
##  [1] s01->s02 s01->s03 s01->s04 s01->s15 s02->s01 s02->s03 s02->s09
##  [8] s02->s10 s03->s01 s03->s04 s03->s05 s03->s08 s03->s10 s03->s11
## [15] s03->s12 s04->s03 s04->s06 s04->s11 s04->s12 s04->s17 s05->s01
## [22] s05->s02 s05->s09 s05->s15 s06->s06 s06->s16 s06->s17 s07->s03
## [29] s07->s08 s07->s10 s07->s14 s08->s03 s08->s07 s08->s09 s09->s10
## [36] s10->s03 s12->s06 s12->s13 s12->s14 s13->s12 s13->s17 s14->s11
## [43] s14->s13 s15->s01 s15->s04 s15->s06 s16->s06 s16->s17 s17->s04

 

The description of an igraph object starts with four letters:

  1. D or U, for a directed or undirected graph
  2. N for a named graph (where nodes have a name attribute)
  3. W for a weighted graph (where edges have a weight attribute)
  4. B for a bipartite (two-mode) graph (where nodes have a type attribute)

The two numbers that follow (17 49) refer to the number of nodes and edges in the graph. The description also lists node & edge attributes, for example:

  • (g/c) - graph-level character attribute
  • (v/c) - vertex-level character attribute
  • (e/n) - edge-level numeric attribute

We also have easy access to nodes, edges, and their attributes with:

E(net)       # The edges of the "net" object
V(net)       # The vertices of the "net" object
E(net)$type  # Edge attribute "type"
V(net)$media # Vertex attribute "media"

# Find nodes and edges by attribute:
# (that returns oblects of type vertex sequence/edge sequence)
V(net)[media=="BBC"]
E(net)[type=="mention"]

# You can also examine the network matrix directly:
net[1,]
net[5,7]

 

It is also easy to extract an edge list or matrix back from the igraph network:

# Get an edge list or a matrix:
as_edgelist(net, names=T)
as_adjacency_matrix(net, attr="weight")

# Or data frames describing nodes and edges:
as_data_frame(net, what="edges")
as_data_frame(net, what="vertices")

 

Now that we have our igraph network object, let’s make a first attempt to plot it.

plot(net) # not a pretty picture!



That doesn’t look very good. Let’s start fixing things by removing the loops in the graph.

net <- simplify(net, remove.multiple = F, remove.loops = T) 

 

We could also use simplify to combine multiple edges by summing their weights with a command like simplify(net, edge.attr.comb=list(Weight="sum","ignore")). Note, however, that this would also combine multiple edge types (in our data: “hyperlinks” and “mentions”).

Let’s and reduce the arrow size and remove the labels (we do that by setting them to NA):

plot(net, edge.arrow.size=.4,vertex.label=NA)

   

3.3 DATASET 2: matrix

  Our second dataset is a network of links between news outlets and consumers. It includes two files, “Dataset2-Media-Example-NODES.csv” and “Dataset2-Media-Example-EDGES.csv” (download here).

nodes2 <- read.csv("Dataset2-Media-User-Example-NODES.csv", header=T, as.is=T)
links2 <- read.csv("Dataset2-Media-User-Example-EDGES.csv", header=T, row.names=1)

Examine the data:

head(nodes2)
head(links2)

3.4 Two-mode (bipartite) networks in igraph

Next we will convert our second network into an igraph object.

We can see that links2 is an adjacency matrix for a two-mode network. Two-mode or bipartite graphs have two different types of actors and links that go across, but not within each type. Our second media example is a network of that kind, examining links between news sources and their consumers.

links2 <- as.matrix(links2)
dim(links2)
dim(nodes2)

As we have seen above, the edges of our second network are in a matrix format. We can read those into a graph object using graph_from_incidence_matrix(). In igraph, bipartite networks have a node attribute called type that is FALSE (or 0) for vertices in one mode and TRUE (or 1) for those in the other mode.

 

head(nodes2)
head(links2)

net2 <- graph_from_incidence_matrix(links2)
table(V(net2)$type)

 

To transform a one-mode network matrix into an igraph object, use graph_from_adjacency_matrix().




4 Plotting networks with igraph



4.1 Plotting parameters

Plotting with igraph: the network plots have a wide set of parameters you can set. Those include node options (starting with vertex.) and edge options (starting with edge.). A list of selected options is included below, but you can also check out ?igraph.plotting for more information.

 

The igraph plotting parameters include (among others):

NODES  
vertex.color  Node color
vertex.frame.color  Node border color
vertex.shape  One of “none”, “circle”, “square”, “csquare”, “rectangle”
 “crectangle”, “vrectangle”, “pie”, “raster”, or “sphere”
vertex.size  Size of the node (default is 15)
vertex.size2  The second size of the node (e.g. for a rectangle)
vertex.label  Character vector used to label the nodes
vertex.label.family  Font family of the label (e.g.“Times”, “Helvetica”)
vertex.label.font  Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
vertex.label.cex  Font size (multiplication factor, device-dependent)
vertex.label.dist  Distance between the label and the vertex
vertex.label.degree  The position of the label in relation to the vertex, where
 0 is right, “pi” is left, “pi/2” is below, and “-pi/2” is above
EDGES  
edge.color  Edge color
edge.width  Edge width, defaults to 1
edge.arrow.size  Arrow size, defaults to 1
edge.arrow.width  Arrow width, defaults to 1
edge.lty  Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”,
 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash”
edge.label  Character vector used to label edges
edge.label.family  Font family of the label (e.g.“Times”, “Helvetica”)
edge.label.font  Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
edge.label.cex  Font size for edge labels
edge.curved  Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5)
arrow.mode  Vector specifying whether edges should have arrows,
 possible values: 0 no arrow, 1 back, 2 forward, 3 both
OTHER  
margin  Empty space margins around the plot, vector with length 4
frame  if TRUE, the plot will be framed
main  If set, adds a title to the plot
sub  If set, adds a subtitle to the plot
asp   Numeric, the aspect ratio of a plot (y/x).
palette  A color palette to use for vertex color
rescale  Whether to rescale coordinates to [-1,1]. Default is TRUE.

   

We can set the node & edge options in two ways - the first one is to specify them in the plot() function, as we are doing below.

# Plot with curved edges (edge.curved=.1) and reduce arrow size:
# Note that using curved edges will allow you to see multiple links
# between two nodes (e.g. links going in either direction, or multiplex links)
plot(net, edge.arrow.size=.4, edge.curved=.1)

# Set edge color to light gray, the node & border color to orange 
# Replace the vertex label with the node names stored in "media"
plot(net, edge.arrow.size=.2, edge.color="orange",
     vertex.color="orange", vertex.frame.color="#ffffff",
     vertex.label=V(net)$media, vertex.label.color="black") 

The second way to set attributes is to add them to the igraph object. Let’s say we want to color our network nodes based on type of media, and size them based on degree centrality (more links -> larger node) We will also change the width of the edges based on their weight.

# Generate colors based on media type:
colrs <- c("gray50", "tomato", "gold")
V(net)$color <- colrs[V(net)$media.type]

# Compute node degrees (#links) and use that to set node size:
deg <- degree(net, mode="all")
V(net)$size <- deg*3
# We could also use the audience size value:
V(net)$size <- V(net)$audience.size*0.6

# The labels are currently node IDs.
# Setting them to NA will render no labels:
V(net)$label <- NA

# Set edge width based on weight:
E(net)$width <- E(net)$weight/6

#change arrow size and edge color:
E(net)$arrow.size <- .2
E(net)$edge.color <- "gray80"

# We can even set the network layout:
graph_attr(net, "layout") <- layout_with_lgl
plot(net) 

We can also override the attributes explicitly in the plot:

plot(net, edge.color="orange", vertex.color="gray50") 

It helps to add a legend explaining the meaning of the colors we used:

plot(net) 
legend(x=-1.5, y=-1.1, c("Newspaper","Television", "Online News"), pch=21,
       col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)

Sometimes, especially with semantic networks, we may be interested in plotting only the labels of the nodes:

plot(net, vertex.shape="none", vertex.label=V(net)$media, 
     vertex.label.font=2, vertex.label.color="gray40",
     vertex.label.cex=.7, edge.color="gray85")

Let’s color the edges of the graph based on their source node color. We can get the starting node for each edge with the ends() igraph function. It returns the start and end vertex for edges listed in the es parameter. The names parameter control whether the function returns edge names or IDs.

edge.start <- ends(net, es=E(net), names=F)[,1]
edge.col <- V(net)$color[edge.start]

plot(net, edge.color=edge.col, edge.curved=.1)  

4.2 Network layouts

Network layouts are simply algorithms that return coordinates for each node in a network.

For the purposes of exploring layouts, we will generate a slightly larger 100-node graph. We use the sample_pa() function which generates a simple graph starting from one node and adding more nodes and links based on a preset level of preferential attachment (Barabasi-Albert model).

net.bg <- sample_pa(100) 
V(net.bg)$size <- 8
V(net.bg)$frame.color <- "white"
V(net.bg)$color <- "orange"
V(net.bg)$label <- "" 
E(net.bg)$arrow.mode <- 0
plot(net.bg)

You can set the layout in the plot function:

plot(net.bg, layout=layout_randomly)

Or you can calculate the vertex coordinates in advance:

l <- layout_in_circle(net.bg)
plot(net.bg, layout=l)

l is simply a matrix of x, y coordinates (N x 2) for the N nodes in the graph. For 3D layouts, it has x, y, and z coordinates (N x 3). You can easily generate your own:

l <- cbind(1:vcount(net.bg), c(1, vcount(net.bg):2))
plot(net.bg, layout=l)

This layout is just an example and not very helpful - thankfully igraph has a number of built-in layouts, including:

# Randomly placed vertices
l <- layout_randomly(net.bg)
plot(net.bg, layout=l)

# Circle layout
l <- layout_in_circle(net.bg)
plot(net.bg, layout=l)

# 3D sphere layout
l <- layout_on_sphere(net.bg)
plot(net.bg, layout=l)

Fruchterman-Reingold is one of the most used force-directed layout algorithms out there.

Force-directed layouts try to get a nice-looking graph where edges are similar in length and cross each other as little as possible. They simulate the graph as a physical system. Nodes are electrically charged particles that repulse each other when they get too close. The edges act as springs that attract connected nodes closer together. As a result, nodes are evenly distributed through the chart area, and the layout is intuitive in that nodes which share more connections are closer to each other. The disadvantage of these algorithms is that they are rather slow and therefore less often used in graphs larger than ~1000 vertices.

l <- layout_with_fr(net.bg)
plot(net.bg, layout=l)


With force-directed layouts, you can use the niter parameter to control the number of iterations to perform. The default is set at 500 iterations. You can lower that number for large graphs to get results faster and check if they look reasonable.

l <- layout_with_fr(net.bg, niter=50)
plot(net.bg, layout=l)


The layout can also interpret edge weights. You can set the “weights” parameter which increases the attraction forces among nodes connected by heavier edges.

ws  <-  c(1, rep(100, ecount(net.bg)-1))
lw <- layout_with_fr(net.bg, weights=ws)
plot(net.bg, layout=lw) 

You will also notice that the Fruchterman-Reingold layout is not deterministic - different runs will result in slightly different configurations. Saving the layout in l allows us to get the exact same result multiple times, which can be helpful if you want to plot the time evolution of a graph, or different relationships – and want nodes to stay in the same place in multiple plots.

par(mfrow=c(2,2), mar=c(0,0,0,0))   # plot four figures - 2 rows, 2 columns
plot(net.bg, layout=layout_with_fr)
plot(net.bg, layout=layout_with_fr)
plot(net.bg, layout=l)
plot(net.bg, layout=l)

dev.off()

By default, the coordinates of the plots are rescaled to the [-1,1] interval for both x and y. You can change that with the parameter rescale=FALSE and rescale your plot manually by multiplying the coordinates by a scalar. You can use norm_coords to normalize the plot with the boundaries you want. This way you can create more compact or spread out layout versions.

l <- layout_with_fr(net.bg)
l <- norm_coords(l, ymin=-1, ymax=1, xmin=-1, xmax=1)

par(mfrow=c(2,2), mar=c(0,0,0,0))
plot(net.bg, rescale=F, layout=l*0.4)
plot(net.bg, rescale=F, layout=l*0.6)
plot(net.bg, rescale=F, layout=l*0.8)
plot(net.bg, rescale=F, layout=l*1.0)

dev.off()


Some layouts have 3D versions that you can use with parameter dim=3. As you might expect, a 3D layout returns a matrix with 3 columns containing the X, Y, and Z coordinates of each node.

l <- layout_with_fr(net.bg, dim=3)
plot(net.bg, layout=l)


Another popular force-directed algorithm that produces nice results for connected graphs is Kamada Kawai. Like Fruchterman Reingold, it attempts to minimize the energy in a spring system.

l <- layout_with_kk(net.bg)
plot(net.bg, layout=l)

Graphopt is a nice force-directed layout implemented in igraph that uses layering to help with visualizations of large networks.

l <- layout_with_graphopt(net.bg)
plot(net.bg, layout=l)

The available graphopt parameters can be used to change the mass and electric charge of nodes, as well as the optimal spring length and the spring constant for edges. The parameter names are charge (defaults to 0.001), mass (defaults to 30), spring.length (defaults to 0), and spring.constant (defaults to 1). Tweaking those can lead to considerably different graph layouts.

# The charge parameter below changes node repulsion:
l1 <- layout_with_graphopt(net.bg, charge=0.02)
l2 <- layout_with_graphopt(net.bg, charge=0.00000001)

par(mfrow=c(1,2), mar=c(1,1,1,1))
plot(net.bg, layout=l1)
plot(net.bg, layout=l2)

dev.off() 

The LGL algorithm is meant for large, connected graphs. Here you can also specify a root: a node that will be placed in the middle of the layout.

plot(net.bg, layout=layout_with_lgl)

The MDS (multidimensional scaling) algorithm tries to place nodes based on some measure of similarity or distance between them. More similar nodes are plotted closer to each other. By default, the measure used is based on the shortest paths between nodes in the network. We can change that by using our own distance matrix (however defined) with the parameter dist. MDS layouts are nice because positions and distances have a clear interpretation. The problem with them is visual clarity: nodes often overlap, or are placed on top of each other.

plot(net.bg, layout=layout_with_mds)

Let’s take a look at all available layouts in igraph:

layouts <- grep("^layout_", ls("package:igraph"), value=TRUE)[-1] 
# Remove layouts that do not apply to our graph.
layouts <- layouts[!grepl("bipartite|merge|norm|sugiyama|tree", layouts)]

par(mfrow=c(3,3), mar=c(1,1,1,1))
for (layout in layouts) {
  print(layout)
  l <- do.call(layout, list(net)) 
  plot(net, edge.arrow.mode=0, layout=l, main=layout) }