> By Fabien Tarissan, Matthieu Latapy and Christophe Prieur
Complex networks are at the core of an intense research activity. However, in most cases, intricate and costly measurement procedures are needed to explore their structure. In some cases, these measurements rely on link queries: given two nodes, it is possible to test the existence of a link between them. These tests may be costly, and thus minimizing their number while maximizing the number of discovered links is a key issue.
Based on the remark that complex networks share common statistical properties (relatively high local density, heterogeneous degree distribution), we propose a procedure in two steps. We first perform a short random phase in which the existence of a link between random pairs of nodes is tested. This gives a partial knowledge of the network topology which, related to the expected statistical properties, helps in a second step to predict what are the links that are the most likely to exist.
The plot above exemplifies this method. It shows the evolution of the number of discovered links (vertical axis) according to the number of tested links (horizontal axis) for different strategies used to order the link queries. They all have been tested on the same original network (the largest Flickr group observed during a study conducted in 2006).
The plot shows that different strategies have very different behaviours and that trying to optimize them is therefore relevant. It allows in particular to consider the degree distribution and the local density as valid principles on which ground the links prediction.
Yet, this plot doesn’t give any insight on the impact those measurements have on the obtained view. The induced bias can indeed be very important and it is therefore crucial to take this aspect into account in the procedure. This is what we intend to explore in further studies.