Number of file-id discovered in a client-side eDonkey measurement

> By Christophe Berger, Clémence Magnien, Matthieu Latapy, Firas Bessadok and Phillipe Jarlov

Number of file-id discovered in a client-side eDonkey measurement

Number of file-id discovered in a client-side eDonkey measurement

We conduct a measurement of files available in eDonkey as follows. Our client connects to all eDonkey servers it discovers (it knows an initial lists of servers and explores the set of all servers reachable from these). Then it sends every 12 hours a given set of keyword-based queries to all these servers. In this measurement, the queries were a set of general keywords and specific paedophile keywords.

We ran this measurement for 140 days, which led to the observation of 2 784 583 distinct files. Among these files, 701 857 had a paedophile keyword in their name. The plot above displays the evolution of the number of observed files of each kind during the measurement.

It appears clearly that we continuously discover significant amounts of new files, even after 140 days of measurement. This may indicate that new files continuously appear at a high rate, and/or that the number of files is so huge that even such measurements fail in obtaining a full list. Notice also that the large number of files with a paedophile keyword in their name is huge, raising important societal concerns.

Notice however that filenames may differ significantly from the actual content of files. Also, this measurement does not allow to deduce the fraction of all files having a paedophile name. Obtaining such insight is extremely challenging, and is the goal of the Measurement and Analysis of P2P Activity Against Paedophile Content project.

This entry was posted in Plots and tagged , ,