> By Oussama Allali, Matthieu Latapy and Clémence Magnien
Link prediction is a key research problem within the analysis of network dynamics. It aims at predicting the links which will appear in future evolution of the network.
We consider here a set of peers and files, where each peer is linked to the files it provides. The collection of such data is described in the paper Ten weeks in the life of an eDonkey server.
We use here nine days of measurement for defining a notion of similarity between peers.
We define here the similarity between two peers as the number of files they provide in common (i.e. the number of files provided by them both). For each peer, we consider its most similar peers as being the ones for which this number is greater than a given threshold.
Our link prediction then consists in saying that each peer p will become a provider of the files provided by all the peers which are most similar to p.
In order to evaluate the relevance of this approach, we consider different values for the similarity threshold (horizontal axis on the plot above). We then compute for each threshold the success rate and proportion of discovered links of the method (vertical axis on the plot above). The success rate is the fraction of predicted links which indeed appear during the next day. The proportion of discovered link is fraction of links which appear in the next day that the method succeeds to predict.
The success rate plot clearly shows that there is an important benefit in using a high threshold for measuring the similarity between peers. However the discovered link plot shows that the benefit obtained by using a high threshold decreases the performance of our prediction.
Notice also the plateau on the plots between thresholds 2000 and 3000, due to the fact that we reach a situation where similarity between peers is the same and is maximum.