Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25 % of queries are paedophile, and that more than 0.2 % of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.
- Contribution à la qualité des informations dans les réseaux sociaux : Identifier et analyser les motifs récurrents pour détecter les phénomènes sociauxManel Mezghani2017, March 16, Room 24-25/405
- affinity index algorithm analysis antipaedo attack bipartite blog network blogs capitalisme social Cascade centrality clustering communities community detection community structure complex network complex networks complex systems compression connected graphs data mining debian degree distribution degree peeling diameter diffusion diffusion phenomena distributed measurements DynamicNetworks dynamics edge-Markovian evolving graph eDonkey ego-centered ego-centered communities email epidemiology event detection evolving graphs evolving networks exploration failure fixed points formal concepts gossip graph graph algorithm graph decompositions Graphs hierarchical clustering honeypot influence influence ranking interaction networks internal links internet Internet topology intrinsic time IP-level ip exchanges lattice leaders link prediction long term communities markovian model measurement mesure d’influence metrics Metrology mobile networks Modelling modularity multi-ego-centered communities multi-scale multipartite graph network dynamics node proximity node similarity opinion dynamics outliers p2p P2P dynamics P2P networks parametric paris paris-traceroute path-vector routing pedophile activity phone power-law radar random graph random walks reachability robustness routing routing tables scale-free security simulation simulations sir social networks spreading spreading cascades stability statistical analysis stochastic process three-state cellular automata time-varying Topology traceroute triangles twitter UDP user profiles viral marketing visualization web wifi