> By Elie Rotenberg and Christophe Crespelle
This plot shows how distance between hosts seen among traceroute outputs and the targets of these traceroutes are likely to be multiple regarding to their average distance.
The data is gathered as follows. We first pick up 60 random, ping-answering IP addresses. We then send successively one traceroute from each of 516 Planetlab hosts (called “sources”), towards each of thoses IPs (called “targets”). For a given target, a traceroute is sent a few seconds after the previous one ended (either with a success or a failure). So far we gathered about 31 thousands raw traceroute outputs over about 10 hours.
For each route given by traceroute towards a given target, we look up the distance (in hops) within the route between the target and each address (or node) present in the route – we call this number “trace route distance”. This way we get for every target and for each address, its average traceroute distance (which we round to the nearest integer) to target over sources, and whether it appears at least twice at two different distances. Then we compute globally the proportion of nodes appearing at a given, rounded average traceroute distance, that appear at least at two different traceroute distances.
Between average distance 7 and 27, the plot can be compared to an affine function, with a coefficient of 0.7% per hop. Indeed, there is a (statistical) chance at each hop to cross a router balancing between routes of different length, leading to different traceroute distances. Therefore the likeliness to be seen at multiple traceroute distances increases with distance to the target.
The rightmost part of the plot may be quite inaccurate since it is mostly based upon nodes very close to the sources that have no reason to be seen multiple times for a given target source-round (remember that every source performs exactly one traceroute). It should anyway be looked close: how can we explain such a ceiling?
The leftmost part of the plot is far more relevant, since the nodes very close to the target are expected to be seen by many sources. The 0-1-distance bracket is naturally empty, since only targets can have occurences of traceroute distance < 1 (that is, exactly 0). The 1-1.5-distance bracket may seem critically low but it can be explained as follows: since the nodes with an average traceroute distance < 1.5 must have a least 50% of their occurences at traceroute distance 1, then these nodes are very unlikely to be exposed to load-balancing and thus to have multiple traceroute distance occurences.
But the very odd part is the 1.5-3.5-distance bracket. The very high rate of multiple traceroute distance node hints that the nodes of average traceroute distance between 1.5 and 3.5 are very exposed to load balancing, thus yielding the question: where is located the load-balancing involved in these variations?
Close to the target, confusing only the last hops? Then how to explain the local minimum at the 5.5-6.5-distance bracket? Far away from the target, “pushing” some nodes far closer to the target than seen on other routes? Then how to explain the positive coefficient in the linear part of the plot? Would there be two (or more) concurrent processes, with distinct domains of predominance, balancing at average distance 6?