Paedophile keywords in eDonkey queries

> By Raphaël Fournier and Guillaume Valadon

Paedophile keywords in eDonkey queries

Paedophile keywords in eDonkey queries

On a P2P system, users submit keyword-based queries to a search engine. Some of them request paedophile content. This plot gives the distribution of the number of paedophile keywords contained in the queries sent to an eDonkey server during a ten-week experiment [1]. We plotted the number of aedophile keywords on the x-axis and the number of queries containing this exact number of paedophile keywords on the y-axis, using a logarithmic scale. A set of 21 paedophile keywords has been gathered in a preliminary study [2], they supposedly are tags for unambiguous paedophile content. Over the 127 million queries that were gathered, slightly more than 115,000 of them were identified as “paedophile” (i.e. they contain at least one paedophile keyword).

One observes that most paedophile queries (95,8%) have only one paedophile keyword and 99,5% contain 3 or less keywords. This observation leads us to carefully investigate the underlying notions around this plot : “what is a paedophile query?”, and “what is a paedophile user?”. Moreover, there are more interesting questions such as “is there a maximum number of paedophile keywords above which a query should be considered as submitted by a non-human user?”, “Are there some combinations of our keywords that are used to search for non-paedophile content?”.

In our work, we aim at counting the number of paedophile users on this server. One should notice that the definition of a paedophile query has a high impact on the figure given.

There were 50,801 IPs which made at least one paedophile query (that contains one paedophile keyword at least) during the experiment. The table below shows the number of different IPs which submitted queries containing exactly N paedophile keywords in the second column. In the last column, IPs that may be counted into several categories are removed, an IP is not counted twice or more.

Keywords Number of different IPs Number of different IPs,
doubles removed
1 49739 48389
2 1803 709
3 603 230
4 211 77
5 47 9
6 54 10
7 19 1
8 3 1
9 1 0

The differences between the second and third column show the importance of the definition of a paedophile user on the results. The difference between the total number of paedophile IPs and the sum of the last column — 49426 — shows that only a small fraction (1375, which is 2.7% of the total) of paedophile IPs submitted queries with a different number of keywords. Our future work will investigate combinations of keywords.

This entry was posted in Plots and tagged ,