> Lamia Benamara et ClĂ©mence magnien

When trying to characterize the dynamics of a system, we are faced with two problems. First, the observation window must be long enough to be representative. Second, the fact that it is finite still induces a bias in the observations, sessions beginning/ending before/after the measurement window are not seen completely.

In order to know if the observation window is long enough to characterize session lenghths. We extract windows that are shorter than the actual observation window. We are then able to compare the obtained session lengths to the ones obtained for the observation window. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We aim to characterize the shape of the session lengths distribution.

One problem to identifying a session is that we have no information about the connections/disconnections of users: indeed, this notion does not exist in the udp protocol of eDonkey. Instead, we only know the moment when a user makes a query, and we have to infer session lengths by studying the time elapsed between consecutive queries made by a same user. We consider that two consecutive queries belong to the same session if the time elapsed between them is less than some threshold T (we choose T= 3 hours).

This plot shows the complementary cumulative distribution of session lengths for different observation window lengths (1 hour, 12 hours, 4days and 1 week). We can see that the shape evolves: for measurement durations of up to one day (not represented for clarity), the distribution exhibits a clear cut off. On the other hand, when the measurement duration grows (starting at 4 days), the shape of the distribution changes: the tail of the distribution flattens after a bend occurring close to 100 000 seconds (28h). Values above this bend are significantly rarer than values below this bend, they are considered as extreme values. Moreover, the distribution for 4 days is close to that of 1 week. Comparing the distributions obtained for 1 and 2 weeks (not represented for clarity) shows that they are almost identical.

We can conclude that if the observation window is at least one week long, it is long enough to characterize session lengths. It is possible to know the shape of the session length distribution, independently of the observation window length (if it is longer than one week); this distribution is characterized by normal session lengths, and extreme values (i. e. values beyond the bend). While the distribution of the normal session lengths is independent of the observation window length, the extreme values elude statistical characterization. They are therefore dependent on the observation window length, though they do not alter the shape of the distribution.

[1] F. Aidouni, M. Latapy, and C. Magnien. Ten weeks in the life of an edonkey server. Proceedings of HotP2P’09, 2009.