Bias in generated random graphs

> By Fabien Viger and Matthieu Latapy

Bias in generated random graphs

Bias in generated random graphs

One of the main approaches for modelling complex networks is to sample a random graph with prescribed degree distribution; in other words, one chooses a graph uniformly at random (i.e. all graphs have the same probability to be chosen) among all the graphs with a given number N of nodes and a given degree distribution.

Constructing such a graph is easy: one considers N nodes with no link, then the degree of each node is chosen (according to the prescribed degree distribution) and this number of half-links are attached to the node. Finally links are constructed by choosing random pairs of half-links.

However, this procedure does not exactly produces a random graph as expected: it produces a random non-simple multi-graph, which means that there may be loops (links from one node to itself) and multiple links (several links between the same two nodes). In addition, the obtained graph may not be connected (there is not always a path from any node to any other), which is in general an expected feature of complex network models.

In practice, one generally solves these issues simply by removing loops, multiple links (obtaining this way what we call the simple version of the graph), and by keeping only the largest connected component, which in practice contains most nodes. It must be clear however that doing this may lead to biased graphs that do not have the prescribed degree distribution, or even the given size. Moreover, the sampling is not uniform anymore.

The plot above illustrates this: for each value of the average degree (horizontal axis) we generated a random multi-graph with a typical (heavy-tailed) degree distribution, and then computed the average degree of its largest connected component (green plot, zc), the one of its simple version (red plot, zs) and the one of the simple version of its largest connected component (purple plot, zsc). The value on the vertical axis is the ratio between the prescribed value (the average degree of the constructed multi-graph) and the obtained one.

These plots clearly show that in most cases the obtained average degree deviates significantly from the expected one (the horizontal black line at y=1).

An alternative to the unsatisfactory approache described above consists in directly sampling a random connected simple graph, thus avoiding the postprocessing procedure described above responsible for the sampling bias. This is however not trivial; see our paper entitled Random generation of large connected simple graphs with prescribed degree distribution for details, and our random graph generator.

This entry was posted in Plots and tagged ,