Evaluation of a new method for measuring the internet degree distribution: Simulation results

Christophe Crespelle and Fabien Tarissan

in Complex Networks, special issue of Computer Communications, 34-5, pp. 635-648 (2011), DOI: 10.1016/j.comcom.2010.06.006

Many contributions rely on the degree distribution of the Internet topology. However, current knowledge of this property is based on biased and erroneous measurements and is subject to much debate. Recently, a new approach, referred to as the Neighborhood Flooding method, was proposed to avoid issues raised by classical measurements. It aims at measuring the neighborhood of Internet core routers by sending traceroute probes from many monitors distributed in the Internet towards a given target router. In this paper, we investigate the accuracy of this method with simulations. Our results show that Neighborhood Flooding is free from the bias highlighted in the classical approach and is able to observe properly the exact degree of a vast majority of nodes in the core of the network. We show how the quality of the estimation depends on the number of monitors used and we carefully examine the influence of parameters of the simulations on our results. We also point out some limitations of the Neighborhood Flooding method and discuss their impact on the observed distribution.


Impact of Sources and Destinations on the Observed Properties of the Internet Topology

Frédéric Ouédraogo, Clémence Magnien

Computer Communications, 34, 670-679, 2011

Maps of the internet topology are generally obtained by measuring the routes from a given set of sources to a given set of destinations (with tools such as traceroute). It has been shown that this approach misses some links and nodes. Worse, in some cases it can induce a bias in the obtained data, i.e. the properties of the obtained maps are significantly different from those of the real topology. In order to reduce this bias, the general approach consists in increasing the number of sources. Some works have studied the relevance of this approach. Most of them have used theoretical results, or simulations on network models. Some papers have used real data obtained from actual measurement procedures to evaluate the importance of the number of sources and destinations, but no work to our knowledge has studied extensively the importance of the choice of sources or destinations. Here, we use real data from internet topology measurements to study this question: by comparing partial measurements to our complete data, we can evaluate the impact of adding sources or destinations on the observed properties. We show that the number of sources and destinations used plays a role in the observed properties, but that their choice, and not only their number, also has a strong influence on the observations. We then study common statistics used to describe the internet topology, and show that they behave differently: some can be trusted once the number of sources and destinations are not too small, while others are difficult to evaluate.


Estimating properties in dynamic systems: the case of churn in P2P networks

Lamia Benamara and Clémence Magnien

Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.

In many systems, such as P2P systems, the dynamicity of participating elements, or churn, has a strong impact. As a consequence, many efforts have been made to characterize it, and in particular to capture the session length distribution. However in most cases, estimating it rigorously is difficult. One of the reasons is that, because the observation window is by definition finite, parts of the sessions that begin before the window and/or end after it are missed. This induces a bias. Although it tends to decrease when the observation window length increases, it is difficult to quantify its importance, or how fast it decreases. Here, we introduce a general methodology that allows us to know if the observation window is long enough to characterize a given property. This methodology is not specific to one study case and may be applied to any property in a dynamic system. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We show that the measurement needs to last for at least one week in order to obtain representative results. We also show that our methodology allows us to precisely characterize the shape of the session length distribution.


Rigorous measurement of IP-level neighborhood of Internet core routers

Christophe Crespelle, Matthieu Latapy, Elie Rotenberg

Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.

Many contributions use the degree distribution of IP-level internet topology. However, current knowledge of this property relies on biased and erroneous measurements, and so it is subject to much debate. We introduce here a new approach, dedicated to the core of the internet, which avoids the issues raised by classical measurements. It is based on the measurement of IP-level neighborhood of internet core routers, for which we design and implement a rigorous method. It consists in sending traceroute probes from many monitors distributed in the internet towards a given target router and carefully selecting the relevant information in collected data. Using simulations, we provide strong evidence of the accuracy of our approach. We then conduct real-world measurements illustrating the practical effectiveness of our method. This constitutes a significant step towards reliable knowledge of the IP-level degree distribution of the core of the internet.


Efficient Measurement of Complex Networks Using Link Queries

Fabien Tarissan, Matthieu Latapy and Christophe Prieur

In Proceedings of the IEEE International Workshop on Network Science For Communication Networks (NetSciCom’09)

Complex networks are at the core of an intense research activity. However, in most cases, intricate and costly measurement procedures are needed to explore their structure. In some cases, these measurements rely on link queries: given two nodes, it is possible to test the existence of a link between them. These tests may be costly, and thus minimizing their number while maximizing the number of discovered links is a key issue. This is a challenging task, though, as initially no information is known on the network. This paper studies this problem: we observe that properties classically observed on real-world complex networks give hints for their efficient measurement; we derive simple principles and several measurement strategies based on this, and experimentally evaluate their efficiency on real-world cases. In order to do so, we introduce methods to evaluate the efficiency of strategies. We also explore the bias that different measurement strategies may induce.


Detection, Understanding, and Prevention of Traceroute Measurement Artifacts

Fabien Viger, Brice Augustin, Xavier Cuvellier, Clémence Magnien, Matthieu Latapy, Timur Friedman, and Renata Teixeira

Computer Networks 52-5 (2008), pp. 998-1018. Extended abstract published in the proceedings of the 6-th Internet Measurement Conference IMC’06, 2006, Rio de Janeiro, Brazil

Test of time award IMC 2022

Traceroute is widely used: from the diagnosis of network problems to the assemblage of internet maps. Unfortunately, there are a number of problems with traceroute methodology, which lead to the inference of erroneous routes. This paper studies particular structures arising in nearly all traceroute measurements. We characterize them as « loops », « cycles », and « diamonds ». We identify load balancing as a possible cause for the appearance of false loops, cycles and diamonds, i.e., artifacts that do not represent the internet topology. We provide a new publicly-available traceroute, called Paris traceroute, which, by controlling the packet header contents, provides a truer picture of the actual routes that packets follow. We performed measurements, from the perspective of a single source tracing towards multiple destinations, and Paris traceroute allowed us to show that many of the particular structures we observe are indeed traceroute measurement artifacts.

Complex Network Measurements: Estimating the Relevance of Observed Properties

Matthieu Latapy and Clémence Magnien

Infocom’08 Proceedings, Phoenix, USA

Complex networks, modeled as large graphs, received much attention during these last years. However, data on such networks is only available through intricate measurement procedures. Until recently, most studies assumed that these procedures eventually lead to samples large enough to be representative of the whole, at least concerning some key properties. This has crucial impact on network modeling and simulation, which rely on these properties. Recent contributions proved that this approach may be misleading, but no solution has been proposed. We provide here the first practical way to distinguish between cases where it is indeed misleading, and cases where the observed properties may be trusted. It consists in studying how the properties of interest evolve when the sample grows, and in particular whether they reach a steady state or not. In order to illustrate this method and to demonstrate its relevance, we apply it to data-sets on complex network measurements that are representative of the ones commonly used. The obtained results show that the method fulfills its goals very well. We moreover identify some properties which seem easier to evaluate in practice, thus opening interesting perspectives.


Distance distribution in random graphs and application to network exploration

Vincent D. Blondel, Jean-Loup Guillaume, Julien M. Hendrickx, and Raphaël M. Jungers

Phys. Rev. E 76, 066101 (2007)

We consider the problem of determining the proportion of edges that are discovered in an ErdsRényigraph when one constructs all shortest paths from a given source node to all other nodes. This problem is equivalent to the one of determining the proportion of edges connecting nodes that are at identical distance from the source node. The evolution of this quantity with the probability of existence of the edges exhibits intriguing oscillatory behavior. In order to perform our analysis, we introduce a different way of computing the distribution of distances between nodes. Our method outperforms previous similar analyses and leads to estimates that coincide remarkably well with numerical simulations. It allows us to characterize the phase transitions appearing when the connectivity probability varies.


Complex Network Metrology

Jean-Loup Guillaume and Matthieu Latapy

Complex Systems 16, pages 83-94, 2005

In order to study some complex networks like the Internet, the Web, social networks or biological networks, one first has to explore them. This gives a partial and biased view of the real object, which is generally assumed to be representative of the whole. However, up to now nobody knows how and how much the measure influences the results. Using the example of the Internet and a rough model of its exploration process, we show that the way a given complex network is explored may strongly influence the observed properties. This leads us to argue for the necessity of developing a science of metrology of complex networks. Its aim would be to study how the partial and biased view of a network relates to the properties of the whole network.


Relevance of Massively Distributed Explorations of the Internet Topology: Qualitative Results

Jean-Loup Guillaume, Matthieu Latapy and Damien Magoni

Computer Networks 50, pages 3197-3224, 2006. Extended abstract published in the proceedings of the 24-th IEEE international conference Infocom’05, 2005, Miami, USA

Internet maps are generally constructed using the traceroute tool from a few sources to many destinations. It appeared recently that this exploration process gives a partial and biased view of the real topology, which leads to the idea of increasing the number of sources to improve the quality of the maps. In this paper, we present a set of experiments we have conducted to evaluate the relevance of this approach. It appears that the statistical properties of the underlying network have a strong influence on the quality of the obtained maps, which can be improved using massively distributed explorations. Conversely, some statistical properties are very robust, and so the known values for the Internet may be considered as reliable. We validate our analysis using real-world data and experiments, and we discuss its implications.