Daniel F. Bernardes, Matthieu Latapy, Fabien Tarissan

Daniel F. Bernardes, Matthieu Latapy, Fabien Tarissan

Daniel F. Bernardes, Matthieu Latapy, Fabien Tarissan

Understanding the spread of information on complex networks is a key issue from a theoretical and applied perspective. Despite the effort in developing theoretical models for this phenomenon, gauging them with large-scale real-world data remains an important challenge due to the scarcity of open, extensive and detailed data. In this paper, we explain how traces of peer-to-peer file sharing may be used to this goal. We also perform simulations to assess the relevance of the standard SIR model to mimic key properties of spreading cascade. We examine the impact of the network topology on observed properties and finally turn to the evaluation of two heterogeneous versions of the SIR model. We conclude that all the models tested failed to reproduce key properties of such cascades: typically real spreading cascades are relatively “elongated” compared to simulated ones. We have also observed some interesting similarities common to all SIR models tested.

Daniel F. Bernardes, Matthieu Latapy, Fabien Tarissan

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand

Many studies have been made on diffusion in the field of epidemiology,

and in the last few years, the development of social networking has

induced new types of diffusion. In this paper, we focus on file

diffusion on a peer-to-peer dynamic network using eDonkey protocol. On

this network, we observe a linear behavior of the actual file

diffusion. This result is interesting, because most diffusion models

exhibit exponential behaviors. In this paper, we propose a new model

of diffusion, based on the SI (Susceptible / Infected) model, which

produces results close to the linear behavior of the observed

diffusion. We then justify the linearity of this model, and we study

its behavior in more details.

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand

Posted in Papers Also tagged diffusion, DynamicNetworks

Matthieu Latapy, Clémence Magnien and Raphaël Fournier

Increasing knowledge of paedophile activity in P2P systems is a crucial societal

concern, with important consequences on child protection, policy making, and

internet regulation. Because of a lack of traces of P2P exchanges and rigorous

analysis methodology, however, current knowledge of this activity remains very

limited. We consider here a widely used P2P system, eDonkey, and focus on two

key statistics: the fraction of paedophile queries entered in the system and the

fraction of users who entered such queries. We collect hundreds of millions of

keyword-based queries; we design a paedophile query detection tool for which we

establish false positive and false negative rates using assessment by experts;

with this tool and these rates, we then estimate the fraction of paedophile

queries in our data; finally, we design and apply methods for quantifying users

who entered such queries. We conclude that approximately 0.25 % of queries are

paedophile, and that more than 0.2 % of users enter such queries. These

statistics are by far the most precise and reliable ever obtained in this

domain.

Matthieu Latapy, Clémence Magnien and Raphaël Fournier

Lamia Benamara and Clémence Magnien

In many systems, such as P2P systems, the dynamicity of participating elements, or *churn*, has a strong impact. As a consequence, many efforts have been made to characterize it, and in particular to capture the session length distribution. However in most cases, estimating it rigorously is difficult. One of the reasons is that, because the observation window is by definition finite, parts of the sessions
that begin before the window and/or end after it are missed. This induces a bias. Although it tends to decrease when the observation window length increases, it is difficult to quantify its importance, or how fast it decreases.

Here, we introduce a general methodology that allows us to know if the observation window is long enough to characterize a given property. This methodology is not specific to one study case and may be applied to any property in a dynamic system. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We show that the measurement needs to last for at least one week in order to obtain representative results. We also show that our methodology allows us to precisely characterize the shape of the session length distribution.

Lamia Benamara and Clémence Magnien

> Lamia Benamara et Clémence magnien When trying to characterize the dynamics of a system, we are faced with two problems. First, the observation window must be long enough to be representative. Second, the fact that it is finite still induces a bias in the observations, sessions beginning/ending before/after the measurement window are not seen […]

> By Oussama Allali, Matthieu Latapy and Clémence Magnien Link prediction is a key research problem within the analysis of network dynamics. It aims at predicting the links which will appear in future evolution of the network. We consider here a set of peers and files, where each peer is linked to the files it […]

Posted in Plots Also tagged bipartite, dynamics, link prediction

> By Raphaël Fournier and Matthieu Latapy P2P systems are known to host a large amount of paedophile activity. Thus, quantifying the number of paedophile users on a P2P system is crucial, for many reasons: easy access to such content is a major societal concern, policy making and law-enforcement budgeting rely on this figure and […]

> Bénédicte Le Grand This plot represents the evolution of the popularity of the keywords ‘avi’, ‘madonna’ and ‘jackson’ in eDonkey queries (captured on an eDonkey server during 102 days in 2009). The values on the y-axis represent the proportion of occurrences of these keywords in eDonkey queries for each day of the capture (x-axis). […]

> By Christophe Berger, Clémence Magnien, Matthieu Latapy, Firas Bessadok and Phillipe Jarlov We conduct a measurement of files available in eDonkey as follows. Our client connects to all eDonkey servers it discovers (it knows an initial lists of servers and explores the set of all servers reachable from these). Then it sends every 12 […]

Posted in Plots Also tagged antipaedo, measurement

> By Clémence Magnien and Matthieu Latapy Download When one wants to study a complex network, one generally first has to conduct an intricate and expensive measurement. This measurement gives a sample of the network which is generally partial and may be biased. In Complex Network Measurements: Estimating the Relevance of Observed Properties we propose […]

Posted in Videos Also tagged measurement, Metrology