Understanding the spread of information on complex networks is a key issue from a theoretical and applied perspective. Despite the effort in developing theoretical models for this phenomenon, gauging them with large-scale real-world data remains an important challenge due to the scarcity of open, extensive and detailed data. In this paper, we explain how traces of peer-to-peer file sharing may be used to this goal. We also perform simulations to assess the relevance of the standard SIR model to mimic key properties of spreading cascade. We examine the impact of the network topology on observed properties and finally turn to the evaluation of two heterogeneous versions of the SIR model. We conclude that all the models tested failed to reproduce key properties of such cascades: typically real spreading cascades are relatively “elongated” compared to simulated ones. We have also observed some interesting similarities common to all SIR models tested.

Many studies have been made on diffusion in the field of epidemiology,

and in the last few years, the development of social networking has

induced new types of diffusion. In this paper, we focus on file

diffusion on a peer-to-peer dynamic network using eDonkey protocol. On

this network, we observe a linear behavior of the actual file

diffusion. This result is interesting, because most diffusion models

exhibit exponential behaviors. In this paper, we propose a new model

of diffusion, based on the SI (Susceptible / Infected) model, which

produces results close to the linear behavior of the observed

diffusion. We then justify the linearity of this model, and we study

its behavior in more details.

Increasing knowledge of paedophile activity in P2P systems is a crucial societal

concern, with important consequences on child protection, policy making, and

internet regulation. Because of a lack of traces of P2P exchanges and rigorous

analysis methodology, however, current knowledge of this activity remains very

limited. We consider here a widely used P2P system, eDonkey, and focus on two

key statistics: the fraction of paedophile queries entered in the system and the

fraction of users who entered such queries. We collect hundreds of millions of

keyword-based queries; we design a paedophile query detection tool for which we

establish false positive and false negative rates using assessment by experts;

with this tool and these rates, we then estimate the fraction of paedophile

queries in our data; finally, we design and apply methods for quantifying users

who entered such queries. We conclude that approximately 0.25 % of queries are

paedophile, and that more than 0.2 % of users enter such queries. These

statistics are by far the most precise and reliable ever obtained in this

domain.

In many systems, such as P2P systems, the dynamicity of participating elements, or *churn*, has a strong impact. As a consequence, many efforts have been made to characterize it, and in particular to capture the session length distribution. However in most cases, estimating it rigorously is difficult. One of the reasons is that, because the observation window is by definition finite, parts of the sessions
that begin before the window and/or end after it are missed. This induces a bias. Although it tends to decrease when the observation window length increases, it is difficult to quantify its importance, or how fast it decreases.

Here, we introduce a general methodology that allows us to know if the observation window is long enough to characterize a given property. This methodology is not specific to one study case and may be applied to any property in a dynamic system. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We show that the measurement needs to last for at least one week in order to obtain representative results. We also show that our methodology allows us to precisely characterize the shape of the session length distribution.

