By Fabien Tarissan, Matthieu Latapy and Christophe Prieur
In a previous study presented in Efficient Measurement of Complex Networks Using Link Queries, we showed that different measurement strategies behave very differently as regard the rapidity for retrieving existing links in large complex networks. It appeared in particular that the strategies based on the degree distribution and the local density were the most efficient whereas the random approaches were inefficient (see this previous plot for instance). But it didn’t tell anything as regard the shape of the samples that were extracted by those strategies.
The plot presented here shows the evolution of the clustering coefficient (ratio of the number of triangles over the number of triples) for different measurement strategies applied on the same original network. The grey dashed line shows the real value. The other curves correspond to the evaluation of this coefficient for the different strategies.
Three main events can be pointed out. First, one might observe that strategies 2 and 5 have a strong overestimation of this coefficient during the random phase whereas strategies 1, 3 and 4 have on the opposite a strong underestimation of the coefficient during the same phase (event A). This is clearly explained by the fact that strategies 2 and 5 test the closure of any triple as soon as it appears during the measurement.
The second event (B) concerns the difference between strategy 3 (relying on on the degree distribution) and strategies 4 and 5 (relying on the local density). It shows that basing on one of those two properties strongly affect the evaluation of the clustering coefficient.
Finally, one can observe that the overestimation of strategies 4 and 5 is eventually balanced as they switch (event C) to relying on the degree distribution which tends to underestimate the coefficient.