Quantifying Paedophile Activity in a Large P2P System


Matthieu Latapy, Clémence Magnien et Raphaël Fournier

in Information Processing and Management, Volume 49, Issue 1, January 2013, Pages 248–263


Increasing knowledge of paedophile activity in P2P systems is a crucial societal
concern, with important consequences on child protection, policy making, and
internet regulation. Because of a lack of traces of P2P exchanges and rigorous
analysis methodology, however, current knowledge of this activity remains very
limited. We consider here a widely used P2P system, eDonkey, and focus on two
key statistics: the fraction of paedophile queries entered in the system and the
fraction of users who entered such queries. We collect hundreds of millions of
keyword-based queries; we design a paedophile query detection tool for which we
establish false positive and false negative rates using assessment by experts;
with this tool and these rates, we then estimate the fraction of paedophile
queries in our data; finally, we design and apply methods for quantifying users
who entered such queries. We conclude that approximately 0.25% of queries are
paedophile, and that more than 0.2% of users enter such queries. These
statistics are by far the most precise and reliable ever obtained in this

This entry was posted in Papers