Quantifying paedophile queries in a large P2P system

Matthieu Latapy, Clémence Magnien and Raphaël Fournier

IEEE Infocom Mini-Conference 2011, Shanghai

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25 % of queries are paedophile, and that more than 0.2 % of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Download

Quantifying paedophile users on a P2P system

Quantifying paedophile users on a P2P system

Number of file-id discovered in a client-side eDonkey measurement

Number of file-id discovered in a client-side eDonkey measurement

Paedophile keywords in eDonkey queries

Paedophile keywords in eDonkey queries

Measurement of eDonkey Activity with Distributed Honeypots

Measurement of eDonkey Activity with Distributed Honeypots

Files diffusion in a edonkey P2P system

Files diffusion in a edonkey P2P system

Time between queries in a P2P system

Time between queries in a P2P system

Ages in queries and filenames

Ages in queries and filenames

Paedophile content in Peer-to-Peer exchanges

Paedophile content in Peer-to-Peer exchanges

P2P file size distribution

P2P file size distribution