Diameter of massive graphs

> By Clémence Magnien, Matthieu Latapy and Michel Habib

Diameter of massive graphs

Diameter of massive graphs

Computing the diameter (i.e. the maximal distance between two nodes) of a huge graph is in many cases too time-consuming to be performed.

In Fast Computation of Empirically Tight Bounds for the Diameter of Massive Graphs we propose several methods to obtain upper and lower bounds for the diameter of a given graph. Although these bounds are guaranteed (it is sure that the diameter is between the bounds), we do not know if the bounds are tight in general. We therefore computed them on a variety of real-world cases; in all of them, the results were excellent: the difference between the upper and lower bounds was a few units only. As the computations needed to compute these bounds are very efficient, this provides an effective solution for diameter estimations when exact computation is out of reach (or not needed).

Our methods to compute upper and lower bounds are not deterministic: if we run them several times, we may obtain different values. As, in all cases, it is guaranteed that the diameter is indeed between the bounds, one may cumpute several bounds and keep the best ones.

The plot above represents the distributions of the obtained bounds for each method, ran 2000 times each on a web graph of approximately 40 million nodes and 800 million links. Each line corresponds to a different method (detailed in the paper)). For upper bounds (the three rightmost lines), we plot for each value x on the horizontal axis the number of times that the method output a bound larger than or equal to x. For lower bounds (the two leftmost lines) we plot for each value x on the horizontal axis the number of times that the method output a bound smaller than or equal to x.

In this case, we therefore reach the conclusion that our 40 million node graph has a diameter between 32 and 33 (it is thereforre 32 or 33, exactly). This is a very precise result, typical of this approach. It is obtained in minutes, while classical methods are unable to handle such massive graphs even within weeks.

This entry was posted in Plots and tagged , ,