Link Prediction and Threads in Email Networks

Qinna Wang

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

We tackle the problem of predicting future links in dynamic networks. For this, we work with the Debian Mailing Lists. In this dataset, a user can post a question to the debian list and other users can reply it by email forming a thread. We show that the number of threads shared in the past between users is a better feature to predict future email exchanges than classical features, like the number of common neighbors. We also show that the structure of a thread do not match the traditional definition of a community, particularly a thread does not have many triangles and has many outgoing connections. While the number of shared (detected) communities is also a better feature to predict future email exchanges than traditional features, is not as good as the number of shared threads. We believe our work should raise interests in characterizing and detecting thread-like structures in dynamic networks.

Download

On the importance of considering social capitalism when measuring influence on Twitter

Maximilien Danisch, Nicolas Dugué and Anthony Perez

2014 International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC 2014)

Influence on Twitter has drawn a lot of attention these past few years since this microblogging service is used to share, seek or debate about any kind of information. Several tools providing so-called influential scores have thus been proposed. However, the algorithms behind them are kept secret and it is not clear how they consider influence. Yet, many users rely on such tools to evaluate and even try to improve their influence in the Twitter network. In a recent work, it has been shown that automatic accounts can obtain high influential scores with no intuitive reason. Extending and completing this work, we show that such measures fail at distinguishing so-called social capitalists from real, truthful users. This enlights the fact that actual scores do not seem to consider the way followers and interactions are obtained on the network. To overcome this issue, we define a classifier that discriminates social capitalists from truthful users. To that aim, we crawled the Twitter network to gather examples of certified social capitalists and regular users and obtained features related to the profile and behavior of each user. We then use such a classifier to balance Klout’s score to adjust influential scores. We also developed an application that allows using our classifier online. We believe our work should raise the question of the legitimacy of influence on Twitter, and lead to significant improvements in the way it is measured.

Download

Learning a Proximity Measure to Complete a Community

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

In large-scale online complex networks (Wikipedia, Facebook, Twitter, etc.) finding nodes related to a specific topic is a strategic research subject. This article focuses on two central notions in this context: communities (groups of highly connected nodes) and proximity measures (indicating whether nodes are topologically close). We propose a parameterized proximity measure which, given a set of nodes belonging to a community, learns the optimal parameters and identifies the other nodes of this community, called multi-ego-centered community as it is centered on a set of nodes. We validate our results on a large dataset of categorized Wikipedia pages and on benchmarks, we also show that our approach performs better than existing ones. Our main contributions are (i) a new ergonomic parametrized proximity measure, (ii) the automatic tuning of the proximity’s parameters and (iii) the unsupervised detection of community boundaries.

Download

Direct Generation of Random Graphs Exactly Realising a Prescribed Degree Sequence

Darko Obradovi, Maximilien Danisch

The 6th International Conference on Computational Aspects of Social Networks (CASON2014)

This paper intends to extend the possibilites available to researchers for the evaluation of directed networks with the use of randomly generated graphs. The direct generation of a simple network with a prescribed degree sequence still seems to be an open issue, since the prominent configuration model usually does not realise the degree distribution exactly. We propose such an algorithm using a heuristic for node prioritisation. We demonstrate that the algorithm samples approximately uniformly. In comparison to the switching Markov Chain Monte Carlo algorithms, the direct generation of edges allows an easy modification of the linking behaviour in the random graph, introducing for example degree correlations, mixing patterns or community structure. That way, more specific random graphs can be generated (non-uniformly) in order to test hypotheses on the question, whether specific network features are due to a specific linking behaviour only. Or it can be used to generate series of synthetic benchmark networks with a specific community structure, including hierarchies and overlaps.

Download

Mining bipartite graphs to improve semantic paedophile activity detection

Raphaël Fournier, Maximilien Danisch

IEEE International Conference on Research Challenges in Information Science 2014 (RCIS2014)

Peer-to-peer (P2P) networks are popular to exchange large volumes of data through the Internet. Pedophile activity is a very important topic for our society and some< works have recently attempted to gauge the extent of pedophile exchanges on P2P networks. A key issue is to obtain an efficient detection tool, which may decide if a sequence of keywords is related to the topic or not. We propose to use social network analysis in a large dataset from a P2P network to improve a state-of-the-art filter for pedophile queries. We obtain queries and thus combinations of words which are not tagged by the filter but should be. We also perform some experiments to explore if the original four categories of paedophile queries were to be found by topological measures only.

Download

Comparing Pedophile Activity in Different P2P Systems

Raphaël Fournier, Thibault Cholez, Matthieu Latapy, Isabelle Chrisment, Clémence Magnien, Olivier Festor and Ivan Daniloff

Soc. Sci. 2014, 3(3), 314-325

Peer-to-peer (P2P) systems are widely used to exchange content over the Internet. Knowledge of pedophile activity in such networks remains limited, despite having important social consequences. Moreover, though there are different P2P systems in use, previous academic works on this topic focused on one system at a time and their results are not directly comparable. We design a methodology for comparing KAD and eDonkey, two P2P systems among the most prominent ones and with different anonymity levels. We monitor two eDonkey servers and the KAD network during several days and record hundreds of thousands of keyword-based queries. We detect pedophile-related queries with a previously validated tool and we propose, for the first time, a large-scale comparison of pedophile activity in two different P2P systems. We conclude that there are significantly fewer pedophile queries in KAD than in eDonkey (approximately 0.09% vs. 0.25%).

Download

Network Decomposition into Fixed Points of Degree Peeling

James Abello and François Queyroi

Social Network Analysis and Mining, Springer, 2014, 4 (1), pp.191

Degree peeling is used to study complex networks. It is a decomposition of the network into vertex groups of increasing minimum degree. However, the peeling value of a vertex is non-local in this context since it relies on the number of connections the vertex has to groups above it. We explore a different way to decompose a network into edge layers such that the local peeling value of the vertices on each layer does not depend on their non-local connections with the other layers. This corresponds to the decomposition of a graph into subgraphs that are invariant with respect to degree peeling, i.e. they are fixed points. We introduce a general method to partition the edges of an arbitrary graph into fixed points of degree peeling, called the iterative-edge-core decomposition. Information from this decomposition is used to formulate a novel notion of vertex diversity based on Shannon’s entropy. We illustrate the usefulness of this decomposition on a variety of social networks including weighted graphs. Our method can be used as a preprocessing step for community detection and graph visualization.

Download

Identifying Roles in an IP Network with Temporal and Structural Density

Tiphaine Viard and Matthieu Latapy

NetSciCom 2014

Captures of IP traffic contain much information on very different kinds of activities like file transfers, users interacting with remote systems, automatic backups, or distributed computations. Identifying such activities is crucial for an appropriate analysis, modeling and monitoring of the traffic. We propose here a notion of density that captures both temporal and structural features of interactions, and generalizes the classical notion of clustering coefficient. We use it to point out important differences between distinct parts of the traffic, and to identify interesting nodes and groups of nodes in terms of roles in the network.

Download

Measuring Routing Tables in the Internet

Élie Rotenberg, Christophe Crespelle and Matthieu Latapy

NetSciCom 2014

The most basic function of an Internet router is to decide, for a given packet, which of its interfaces it will use to forward it to its next hop. To do so, routers maintain a routing table, in which they look up for a prefix of the destination address. The routing table associates an interface of the router to this prefix, and this interface is used to forward the packet. We explore here a new measurement method based upon distributed UDP probing to estimate this routing table for Internet routers.

Download

Multi-ego-centred communities

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

in « Complex Networks », pages 76-111, H. Cherifi (ed), Cambridge Scholars Publishing, 2014

The community structure of a graph is defined in various ways in the literature: partition, where nodes can belong to only one community. This vision is unrealistic and may lead to poor results because most nodes belong to several communities in real-world networks; overlapping community structure, which is the most natural view, but is often very difficult to identify in practice due to the complex structure of real-world networks and the huge potential number of such communities; egocentered community structure which focuses on individual nodes’ communities and seems to be a good compromise. In this chapter, the third vision is investigated; a new proximity measure based on opinion dynamics is proposed to score and select nodes according to their proximity to a node of interest. We call it the carryover opinion. In addition to be parameter-free, the carryover opinion can be calculated in a very time-efficient way and can thus be used in very large graphs. We also go further in the idea of egocentered communities by introducing the new concept of multi-egocentered communities, i.e., focusing on the communities of a set of nodes rather than of a single node. A key idea is that, although one node generally belongs to numerous communities, e.g., friends, colleagues, family, a small set of appropriate nodes can fully characterize a single community. We also show how to unfold all egocentered communities of a given node using this notion of multi-egocentered community.

Download

Multi-ego-centered communities in practice

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

Social Network Analysis and Mining, Springer, 2014, 4 (1), pp.180.

We propose here a framework to unfold the ego-centered community structure of a given node in a network. The framework is not based on the optimization of a quality function, but on the study of the irregularity of the decrease of a proximity measure. It is a practical use of the notion of multi-ego-centered community and we validate the pertinence of the approach on benchmarks and a real-world network of wikipedia pages.

Download

Detecting events in the dynamics of ego-centred measurements of the Internet topology

Matthieu Latapy, Assia Hamzaoui, Clémence Magnien

Journal of Complex Networks 2(1): 38-59, 2014

Detecting events such as major routing changes or congestions in the dynamics of the Internet topology is an important but challenging task. We explore here an empirical approach based on a notion of statistically significant events. It consists in identifying properties of graph dynamics which exhibit a homogeneous distribution with outliers, corresponding to events. We apply this approach to ego-centred measurements of the  Internet topology (views obtained from a single monitor) and show that it succeeds in detecting meaningful events. Finally, we give some hints for   the interpretation of detected events in terms of network operations.

Download

Inadequacy of SIR Model to Reproduce Key Properties of Real-world Spreading Phenomena: Experiments on a Large-scale P2P System

Daniel Bernardes, Matthieu Latapy, Fabien Tarissan

In Journal of Social Network Analysis and Mining, 3(4):1195-1208,Springer, 2013.

Understanding the spread of information on complex networks is a key issue from a theoretical and applied perspective. Despite the effort in developing theoretical models for this phenomenon, gauging them with large-scale real-world data remains an important challenge due to the scarcity of open, extensive and detailed data. In this paper, we explain how traces of peer-to-peer file sharing may be used to this goal. We reconstruct the underlying social network of peers sharing content and perform simulations on it to assess the relevance of the standard SIR model to mimic key properties of real spreading cascades. First we examine the impact of the network topology on observed properties. Then we turn to the evaluation of two heterogeneous extensions of the SIR model. Finally we improve the social network reconstruction, introducing an affinity index between peers, and simulate a SIR model which integrates this new feature. We conclude that the simple, homogeneous model is insufficient to mimic real spreading cascades. Moreover, none of the natural extensions of the model we considered, which take into account extra topological properties, yielded satisfying results in our context. This raises an alert against the careless, widespread use of this model.

Download

On the relevance of the edge-Markovian evolving graph model for real mobile networks

Aurélie Faure de Pebeyre, Fabien Tarissan et Julien Sopena

IFIP Wireless Days conference (WD’13), Valencia, Spain, 2013

The development of wireless devices led the scientific community to focus more and more on systems of interaction composed of moving entities. In this context, different models have been proposed in an attempt to capture properties of the observed dynamics. Among those models, the edge-Markovian evolving graph model is appealing since it enables to highlight temporal dependencies in the evolution of the graphs. This model relies on two parameters accounting respectively for the creation and suppression of links in the graph. Thus it assumes that these two parameters are sufficient to characterise the dynamics during all the evolution of the graph. In this paper, we test this hypothesis by confronting the model to 6 datasets recording real traces of evolving networks. In particular, we study the proportion of created and deleted links over the time. The results show that 5 of the 6 case studies present an heterogeneous distribution of those fractions which contradicts the underlying hypothesis of the model. Besides, in order to understand the importance this might have as regard structural properties of real networks, we also study the impact the Markovian model has on the mean degree over the time. It turns out that even in the suitable case, the model fails to reproduce correctly this property which indicates its inadequacy for even more complex properties of real evolving networks

Download

Method of reliability and availability analysis – From the dynamic properties of routing and forwarding paths

Dimitri Papadimitriou, Davide Careglio, Fabien Tarissan and Piet Demeester

Proceedings of the International Workshop on Reliable Networks Design and Modeling (RNDM’13), 2013 (Invited paper)

Confronted to the increasing dynamic of Internet routing system and its underlying topology, we propose a reliability and availability analysis method based on the characterization of the dynamic properties (in particular, the stability properties) of routing paths and their corresponding forwarding paths. The key driver underlying this method is that transient but frequent changes in the spatio-temporal properties of routing paths can affect the performance and operating conditions of the corresponding forwarding paths; hence, their reliability. The results obtained by means of this method verify that, although the main cause of instability results from the forwarding plane dynamics, a second order effect relates forwarding and routing path instability events. Applying our analysis method reveals that the dominant source (main cause) of instability originates indeed from the forwarding plane. This result which confirms previous studies from 2003 further corroborates the assumption that the dynamic properties of routing system are mainly driven by its adaptation to the forwarding system (adaptive routing). However, even if the likelihood of forwarding instability becomes the prominent behavior (cause), about 50% of them induce routing path instability whereas the corresponding forwarding path remains unstable. This observation suggests that 50% of the reactive decisions performed by the BGP routing system (reactive routing) tend to further delay convergence of the forwarding paths. In turn, this observation provides the first indication that simple causal effects can’t explain anymore the occurrence of instability. Moreover, more elaborated analysis techniques (such as the one proposed in this paper) are required to explain the inter-dependent routing and forwarding paths dynamics which affects their reliability and availability.

Download

Un modèle pour les graphes bipartis aléatoires avec redondance

Émilie Coupechoux and Fabien Tarissan

4ème Journées Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI’13), 2013

Current models of random graphs do not capture all the properties observed in realworld networks. In particular, two cliques in such models generally do not have more than one node in common, while it is intuitive that, in social networks for instance, two friends have more than one interest in common. The model presented here aims at capturing this kind of property. More precisely, we present a model for random tripartite graphs such that the bipartite projection has degree and redundancy distributions close to those of a given bipartite graph.

Download

valuation du modèle évolutif par arête-markovienne pour reproduire la dynamique des réseaux mobiles

Aurélie Faure de Pebeyre, Fabien Tarissan and Julien Sopena

In 4ème Journées Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI’13), 2013

L’avènement des équipements mobiles a amené la communauté scientifique à étudier plus intensément les systèmes d’interactions formés par des entités en mouvement. Dans ce contexte, plusieurs modèles ont été proposés pour tenter de capturer les propriétés dynamiques de tels systèmes. Parmi ceux-ci, le modèle de graphes évolutifs à arêtes markoviennes est attirant en ce qu’il met en avant les dépendances temporelles dans un graphe dynamique. Ce modèle repose sur l’identification de deux paramètres régissant respectivement l’apparition et la disparition des liens dans le graphe et fait donc l’hypothèse que ces deux paramètres sont suffisants pour caractériser cette dynamique sur l’ensemble de la durée de vie du graphe. Dans cet article nous testons la pertinence de cette hypothèse par rapport à 6 jeux de données réelles. Pour se faire, nous avons étudié la fraction de liens créés et supprimés au cours du temps. Les résultats montrent que dans 5 cas sur les 6 étudiés, la répartition de ces fractions est hétérogène, ce qui contredit l’hypothèse faite par le modèle. De plus, nous avons regardé l’impact que le modèle markovien avait sur le degré moyen des nœuds au cours du temps. Il s’avère que même dans le jeu de données favorable au modèle, ce dernier échoue à rendre compte du comportement des réseaux dynamiques de façon satisfaisante.

Download

The Power of Consensus: Random Graphs Have No Communities

Romain Campigotto, Jean-Loup Guillaume and Massoud Seifi

Proceedings of the 5th IEEE/ACM International Conference on Advances in Social Networks and Mining (ASONAM 2013). Niagara Falls, Canada.

Communities are a powerful tool to describe the structure of complex networks. Algorithms aiming at maximizing a quality function called modularity have been shown to effectively compute the community structure. However, some problems remain: in particular, it is possible to find high modularity partitions in graph without any community structure, in particular random graphs. In this paper, we study the notion of consensual communities and show that they do not exist in random graphs. For that, we exhibit a phase transition based on the strength of consensus: below a given threshold, all the nodes belongs to the same consensual community; above this threshold, each node is in its own consensual community.

Download

Towards realistic modeling of IP-level routing topology dynamics

Clémence Magnien, Amélie Medem, Sergey Kirgizov, Fabien Tarissan

Networking Science, 4 (1-4), p. 24-33, 2013

Many works have studied the Internet topology, but few have investigated the question of how it evolves over time. This paper focuses on the Internet routing IP-level topology and proposes a first step towards realistic modeling of its dynamics. We study periodic measurements of routing trees from a single monitor to a fixed destination set and identify invariant properties of its dynamics. Based on those observations, we then propose a model for the underlying mechanisms of the topology dynamics. Our model remains simple as it only incorporates load-balancing phenomena and routing changes. By extensive simulations,  we show that, despite its simplicity, this model effectively captures the observed behaviors, thus providing key insights of relevant mechanisms governing the Internet routing dynamics. Besides, by confronting simulations over different kinds of topology, we also provide insights of which structural properties play a key role to explain the properties of the observed dynamics, which therefore strengthens the relevance of our model.

Download

The network of the International Criminal Court decisions as a complex system

Fabien Tarissan and Raphaëlle Nollez-Goldbach

ISCS 2013: Interdisciplinary Symposium on Complex Systems, Emergence, Complexity and Computation, 8:225-264, Springer, 2013.

Many real-world networks lend themselves to the use of graphs for analysing and modeling their structure. This approach has proved to be very useful for a wide variety of networks stemming from very different fields. Yet, only few papers focused their attention on legal networks. This paper intends precisely to remedy this situation by analysing a major legal network by means of complex system methods. The network under investigation is the network composed by decisions taken by the International Criminal Court since its creation. We first model the network by a simple directed graph in which nodes are the decisions and links represent citations between decisions. Our analysis shows that standard properties shared by common real networks are also present in this network. Then we turn to studying the network by means of bipartite graphs that involve both decisions and articles of law. We show that this two-level structure presents several non trivial properties and we show evidences of the relevance of the bipartite representation to explain properties observed in the graph of citations.

Download