Tabourier, L. and Stoica, A. and Peruani, F.
Communication Systems and Networks (COMSNETS), 2012 Fourth International Conference on
How to detect causality effects on large dynamical communication networks: A case study

Here we propose a set of dynamical measures to detect causality effects on communication datasets. Using appropriate comparison models, we are able to enumerate patterns containing causality relationships. This approach is illustrated on a large cellphone call dataset: we show that specific patterns such as short chain-like trees and directed loops are more frequent in real networks than in comparison models at short time scales. We argue that these patterns - which involve a node and its close neighborhood - constitute indirect evidence of active spreading of information only at a local level. This suggests that mobile phone networks are used almost exclusively to communicate information to a closed group of individuals. Furthermore, our study reveals that the bursty activity of the callers promotes larger patterns at small time scales.

Massoud Seifi, Jean-Loup Guillaume, Ivan Junier, Jean-Baptiste Rouquier and Svilen Iskrov
Proceedings of the 3rd Workshop on Complex Networks (CompleNet 2012), Melbourne, Florida
Stable community cores in complex networks

Complex networks are generally composed of dense sub-networks called communities. Many algorithms have been proposed to automatically detect such communities. However, they are often unstable and behave non-deterministically. We propose here to use this non-determinism in order to compute groups of nodes on which community detection algorithms agree most of the time.We show that these groups of nodes, called community cores, are more similar to Ground Truth than communities in real and arti cial networks. Furthermore, we show that in contrary to the classical approaches, we can reveal the absence of community structure in random graphs.

Massoud Seifi and Jean-Loup Guillaume
Proceedings of the Mining Social Network Dynamic 2012 Workshop (MSND), In conjunction with the international conference World Wide Web WWW 2012, Lyon, France, pp. 1173-1180
Community Cores in Evolving Networks

Community structure is a key property of complex networks. Many algorithms have been proposed to automatically detect communities in static networks but few studies have considered the detection and tracking of communities in an evolving network. Tracking the evolution of a given community over time requires a clustering algorithm that produces stable clusters. However, most community detection algorithms are very unstable and therefore unusable for evolving networks. In this paper, we apply the methodology proposed in [14] to detect what we call community cores in evolving networks. We show that cores are much more stable than "classical" communities and that we can overcome the disadvantages of the stabilized methods.

Bivas Mitra, Lionel Tabourier and Camille Roth
Computer Networks, Vol. 56(3), 2012.
Intrinsically dynamic communities from evolving, directed network data

Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. We assume that commu- nities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information: we suggest that it blurs essentially dynamic phe- nomena—such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. We propose a formalism which aims at tackling this issue in the context of time-directed data- sets (such as citation networks), and present several illustrations on both empirical and synthetic dynamic networks. We eventually introduce intrinsically dynamic metrics to qualify temporal community structure and emphasize their possible role as an estimator of the quality of the community detection—taking into account the fact that various empir- ical contexts may call for distinct ‘community’ definitions and detection criteria.

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand
Proceedings of the  Mining Social Network Dynamic 2012 Workshop (MSND), In conjunction with the international conference World Wide Web WWW 2012, Lyon, France
File Diffusion in a Dynamic Peer-to-peer Network

Many studies have been made on diffusion in the field of epidemiology, and in the last few years, the development of social networking has induced new types of diffusion. In this paper, we focus on file diffusion on a peer-to-peer dynamic network using eDonkey protocol. On this network, we observe a linear behavior of the actual file diffusion. This result is interesting, because most diffusion models exhibit exponential behaviors. In this paper, we propose a new model of diffusion, based on the SI (Susceptible / Infected) model, which produces results close to the linear behavior of the observed diffusion. We then justify the linearity of this model, and we study its behavior in more details.

Matthieu Latapy, Clémence Magnien et Raphaël Fournier
To appear in Information Processing and Management, Feb 2012
Quantifying Paedophile Activity in a Large P2P System

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25% of queries are paedophile, and that more than 0.2% of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Amélie Medem, Clémence Magnien and Fabien Tarissan
Fourth International Workshop on Network Science for Communication Networks (NetSciCom), 2012
Impact of power-law topology on IP-level routing dynamics: simulation results

This paper focuses on the Internet IP-level routing topology and proposes relevant explanations to its apparent dynamics.We first represent this topology as a power-law random graph. Then, we incorporate to the graph two well known factors responsible for the observed dynamics, which are load balancing and route evolution. Finally, we simulate on the graph traceroute-like measurements. Repeating the process many times, we obtain several graph instances that we use to model the dynamics. Our results show that we are able to capture on power-law graphs the dynamic behaviors observed on the Internet. We find that the results on power-law graphs, while qualitatively similar to the one of  Erdös-Rényi  random graphs, highly differ quantitatively; for instance, the rate of discovery of new nodes in power-law graphs is extremely low compared to the rate in Erdös-Rényi graphs.

Oussama Allali, Lionel Tabourier, Clémence Magnien, Matthieu Latapy
To appear in Social Network Analysis and Mining
Internal links and pairs as a new tool for the analysis of bipartite complex networks

Many real-world complex networks are best modeled as bipartite (or 2-mode) graphs, where nodes are divided into two sets with links connecting one side to the other. However, there is currently a lack of methods to analyze properly such graphs as most existing measures and methods are suited to classical graphs. A usual but limited approach consists in deriving 1-mode graphs (called projections) from the underlying bipartite structure, though it causes important loss of information and data storage issues. We introduce here internal links and pairs as a new notion useful for a bipartite analysis, and which gives insights on the information lost by projecting the bipartite graph. We illustrate the relevance of theses concepts on several real-world instances illustrating how it enables to discriminate behaviors among various cases when we compare them to a benchmark of random graphs. Then, we show that we can draw benefit from this concept for both modeling complex networks and storing them in a compact format.

Fernando Peruani and Lionel Tabourier
PLoS ONE, Vol 6(12)
Directedness of Information Flow in Mobile Phone Communication Networks

Without having direct access to the information that is being exchanged, traces of information flow can be obtained by looking at temporal sequences of user interactions. These sequences can be represented as causality trees whose statistics result from a complex interplay between the topology of the underlying (social) network and the time correlations among the communications. Here, we study causality trees in mobile-phone data, which can be represented as a dynamical directed network. This representation of the data reveals the existence of super-spreaders and super-receivers. We show that the tree statistics, respectively the information spreading process, are extremely sensitive to the in-out degree correlation exhibited by the users. We also learn that a given information, e.g., a rumor, would require users to retransmit it for more than 30 hours in order to cover a macroscopic fraction of the system. Our analysis indicates that topological node-node correlations of the underlying social network, while allowing the existence of information loops, they also promote information spreading. Temporal correlations, and therefore causality effects, are only visible as local phenomena and during short time scales. Consequently, the very idea that there is (intentional) information spreading beyond a small vicinity is called into question. These results are obtained through a combination of theory and data analysis techniques.

Bénédicte Le Grand et Matthieu Latapy
Atelier EGC 2011 Visualisation et Extraction de Connaissances, Brest, France, Janvier 2011.
Détection visuelle d’événements dans des grands réseaux d’interaction dynamiques. Application à l’Internet

L’objectif des travaux présentés dans ce papier est de faciliter la détection visuelle d’événements dans des réseaux d’interaction dynamiques de grande taille.
Deux méthodes de visualisation classiques et «exhaustives» ont été étudiées, qui repré-sentent l’évolution des liens du réseau au fil du temps. Les limites liées au facteur d’échelle nous ont conduits à proposer deux métaphores restreintes au suivi des noeuds du réseau. Les forces, les limites et la complémentarité de ces quatre métaphores nous ont permis de déga-ger une ébauche de méthodologie de détection d’événements dans la dynamique de grands réseaux d’interaction.
Les visualisations et la méthodologie présentées dans cet article sont génériques et appli-cables à tout type de noeuds et de liens ; elles sont ici appliquées pour illustration à un sous-ensemble du réseau Internet.

Cassio Melo, Benedicte Le Grand, Marie-Aude Aufaure, and Anastazia Berezianos
IV’2011 conference on Information Visualization, London, July 2011.
Extracting and Visualizing Tree-Like Structures from Concept Lattices

Concept lattices built with Formal Concept Analysis are usually represented by Hasse diagrams illustrating the groupings of objects described by common attributes. Hasse diagrams display the relations of partial order between concepts in a hierarchical fashion, where each concept may have several parent concepts. Lattice visualization becomes a problem as the number of clusters grows significantly with the number of objects and attributes. Interpreting the lattice through a direct visualization of the line diagram rapidly becomes impossible and more synthetic representations are needed. In this work we propose several methods to enhance the readability of concept lattices firstly though colouring and distortion techniques, and secondly by extracting and visualizing trees derived from concept lattices structures.

Abdelhamid Salah brahim, Bénédicte Le Grand, Lionel Tabourier, Matthieu Latapy
Journal of Computational Science, Vol 2(3), 2011
Citations among blogs in a hierarchy of communities: method and case study

How does the structure of a network (e.g. its organization into groups or communities) impact the interaction among its nodes? In this paper we propose a generic methodology to study the correlation between complex networks interactions and their community structure. We illustrate it on a blog network and focus on citation links. We first define a homophily probability evaluating the tendency of blogs to ite blogs from the same communities. We then introduce the notion of community distance to capture if a blog cites (or is cited by) blogs distant or not from its community. We analyze the distribution of distances corresponding to each citation link, and use it to build maps of relevant communities which help interpreting blogs interactions. This community-oriented approach allows to study citation links at various abstraction levels, and conversely, enable us to characterize communities with regard to their citation behaviour.

Thomas Aynaud and Jean-Loup Guillaume
in Proceedings of MARAMI 2011
Extraction hiérarchique de fenêtres de temps basée sur la structure communautaire

Dans cet article nous décrivons une méthode de décomposition du temps en fenêtres de temps dans un graphe dynamique. Une particularité de la méthode est que le résultat est un regroupement hiérarchique : les fenêtres de temps sont elles-mêmes susceptibles d'en contenir. En outre, les fenêtres n'ont pas besoin d'être contiguës ce qui permet par exemple de détecter une structure se répétant. De plus, chaque fenêtre est associée à une décomposition en communautés représentant la structure topologique du réseau durant cette fenêtre. Nous appliquons ensuite cette méthode à trois graphes de terrain dynamiques ayant des caractéristiques différentes pour montrer que les fenêtres identifiées correspondent bien à des phénomènes observables. In this paper, we describe a way to cluster the time in time windows in a dynamic network. The result is a tree and thus time windows can themselves contain smaller ones. Moreover, the windows do not have to be consecutive and this allows for instance to detect repeated structure. Each window is also associated to a community decomposition that represents the topological structure of the network during this window. We then apply the method to three dynamic networks to show that observed time windows correspond to observable phenomena.

Alice Albano
JFGG 2011
Phénomènes de diffusion dans les réseaux dynamiques : simulation et modélisation

Les phénomènes de diffusion sont présents dans de nombreux contextes: diffusion d’épidémies, de virus informatiques, d’information dans des réseaux sociaux, etc. Bien que les réseaux où se produit la diffusion soient souvent dynamiques, cette dynamique n’est pas prise en compte dans la plupart des modèles existants. L’objectif de ces travaux est de proposer des modèles de diffusion, et d’étudier l’impact de la dynamique du réseau sur la diffusion.

Oussama Allali, Clémence Magnien and Matthieu Latapy
To appear in Dynamic Networks and Knowledge Discovery, special issue of Intelligent Data Analysis.
Internal links prediction: a new approach for predicting links in bipartite graphs

Many real-world complex networks, like actor-movie or file-provider relations, have a bipartite nature and evolve over time. Predicting links that will appear in them is one of the main approach to understand their dynamics. Only few works address the bipartite case, though, despite its high practical interest and the specific challenges it raises. We define in this paper the notion of internal links in bipartite graphs and propose a link prediction method based on them. We thoroughly describe the method and its variations, and experimentally compare it to a basic collaborative filtering approach. We present results obtained for a typical practical case. We reach the conclusion that our method performs very well, and we study in details how its parameters may influence obtained results.

Thomas Aynaud and Jean-Loup Guillaume
proceedings of the Fifth SNA-KDD Workshop Social Network Mining and Analysis, in conjunction with the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011)
Multi-Step Community Detection and Hierarchical Time Segmentation in Evolving Networks

Many complex systems composed of interacting objects  like social networks or the web can be modeled as graphs. They can usually be divided in dense sub-graphs with few links between them, called communities and detecting this underlying community structure may have a major impact in the understanding of these systems. We focus here on evolving graphs, for which the usual approach is to represent the state of the system at different time steps and to compute communities independently on the graph obtained at each time step. We propose in this paper to use a different framework: instead of detecting communities on each time step, we detect a unique decomposition in communities that is relevant for (almost) every time step during a given period called the time window.  We propose a definition of this new decomposition and two algorithms to detect it quickly. We validate both the approach and the algorithms on three evolving networks of different kinds showing that the quality loss at each time step is very low despite the constraint of maximization on several time steps. Since the time window length is a crucial parameter of our technique, we also propose an unsupervised hierarchical clustering algorithm to build automatically a hierarchical time segmentation into time windows. This clustering relies on a new similarity measure based on community structure. We show that it is very efficient in detecting meaningful windows.

Matthieu Latapy, Clémence Magnien and Frédéric Ouédraogo
Complex Systems, 20 (1), 23-30, 2011.
A Radar for the Internet

Mapping the internet's topology is a challenge in itself, and studying its dynamics is even more difficult. Achieving this would however provide key information on the nature of the internet, crucial for modeling and simulation. Moreover, detecting anomalies in this dynamics is a key issue for security. We introduce here a new measurement approach which makes it possible to capture internet dynamics at a scale of a few minutes in a radar-like manner. By conducting and analyzing large-scale measurements of this kind, we rigorously and automatically detect events in the observed dynamics, which is totally out of reach of previous approaches.

Oussama Allali, Clémence Magnien and Matthieu Latapy
Proceedings of the third International Workshop on Network Science for Communication Networks (Netscicom 2011), In conjunction with IEEE Infocom 2011.
Link prediction in bipartite graphs using internal links and weighted projection

Many real-world complex networks, like client-product or file-provider relations, have a bipartite nature and evolve during time. Predicting links that will appear in them is one of the main approach to understand their dynamics. Only few works address the bipartite case, though, despite its high practical interest and the specific challenges it raises. We define in this paper the notion of internal links in bipartite graphs and propose a link prediction method based on them. We describe the method and experimentally compare it to a basic collaborative filtering approach. We present results obtained for two typical practical cases. We reach the conclusion that our method performs very well, and that internal links play an important role in bipartite graphs and their dynamics.

Matthieu Latapy, Clémence Magnien and Raphaël Fournier
Quantifying paedophile queries in a large P2P system

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25 % of queries are paedophile, and that more than 0.2 % of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Thomas Aynaud, Vincent Blondel, Jean-Loup Guillaume and Renaud Lambiotte
in Partitionnement de graphe : optimisation et applications, Traité IC2, Hermes-Lavoisier 2011
Optimisation locale multi-niveaux de la modularité

Dans ce chapitre, nous présentons une méthode gloutonne pour optimiser la modularité d'un graphe. Cette méthode de partionnement permet de traiter avec une excellente précision des systèmes de taille inégalée, allant jusqu'à plusieurs milliards de liens. Notre algorithme a de surcroît l'avantage de ne pas être limité à l'optimisation de la modularité puisqu'il peut être généralisé à d'autres fonctions de qualité, et de découvrir des communautés à différentes échelles. Les performances de l'algorithme sont évaluées sur des graphes artificiels pour lesquels la structure communautaire est connue, ainsi que sur des graphes de terrain réels.

Thomas Aynaud and Jean-Loup Guillaume
Latin-American Workshop on Dynamic Networks (LAWDN), Buenos Aires, 2010
Long range community detection

Complex networks can usually be divided in dense subnetworks called communities. In evolving networks, the usual way to detect communities is to find several partitions independently, one for each time step. However, this generally causes troubles when trying to track communities from one time step to the next. We propose here a new method to detect only one decomposition in communities that is good for (almost) every time step. We show that this unique partition can be computed with a modification of the Louvain method and that the loss of quality at each time step is generally low despite the constraint of global maximization. We also show that some specific modifications of the networks topology can be identified using this unique partition in the case of the Internet topology.

Thomas Aynaud and Jean-Loup Guillaume
Journée thématique Fouille de grands graphes, en conjonction avec la première conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Toulouse, France, 2010
Détection de communautés à long terme dans les graphes dynamiques

La plupart des graphes de terrain peuvent être décomposés en sous graphes denses appelés communautés. Habituellement, dans des graphes dynamiques, les communautés sont détectées pour chaque instant indépendamment ce qui pose de nombreux problèmes tels que la stabilité ou le suivi de des communautés entre deux décompositions successives. Nous proposons ici une méthode pour trouver une partition unique, de qualité, couvrant une longue période. Cette décomposition peut être trouvée efficacement via une adaptation de la méthode de Louvain et la perte de qualité à chaque instant due à la contrainte de détecter des communautés globales s’avère assez faible.

Christophe Crespelle and Fabien Tarissan
in Complex Networks, special issue of Computer Communications, 34-5, pp. 635-648 (2011), DOI: 10.1016/j.comcom.2010.06.006
Evaluation of a new method for measuring the internet degree distribution: Simulation results

Many contributions rely on the degree distribution of the Internet topology. However, current knowledge of this property is based on biased and erroneous measurements and is subject to much debate. Recently, a new approach, referred to as the Neighborhood Flooding method, was proposed to avoid issues raised by classical measurements. It aims at measuring the neighborhood of Internet core routers by sending traceroute probes from many monitors distributed in the Internet towards a given target router. In this paper, we investigate the accuracy of this method with simulations. Our results show that Neighborhood Flooding is free from the bias highlighted in the classical approach and is able to observe properly the exact degree of a vast majority of nodes in the core of the network. We show how the quality of the estimation depends on the number of monitors used and we carefully examine the influence of parameters of the simulations on our results. We also point out some limitations of the Neighborhood Flooding method and discuss their impact on the observed distribution.

Thomas Aynaud and Jean-Loup Guillaume
Proceedings of International Workshop on Dynamic Networks (WDN), in conjunction with WiOpt 2010, pages 508-514
Static community detection algorithms for evolving networks

Complex networks can often be divided in dense sub-networks called communities. We study, using a partition edit distance, how three community detection algorithms transform their outputs if the input network is sligthly modified. The instabilities appear to be important and we propose a modification of one algorithm to stabilize it and to allow the tracking of the communities in a dynamic network. This modification has one parameter which is a tradeoff between stability and quality. The resulting algorithm appears to be very effective. We finally use it on a dynamic network of blogs.

Frédéric Ouédraogo, Clémence Magnien
Computer Communications, 34, 670-679, 2011
Impact of Sources and Destinations on the Observed Properties of the Internet Topology

Maps of the internet topology are generally obtained by measuring the routes from a given set of sources to a given set of destinations (with tools such as traceroute). It has been shown that this approach misses some links and nodes. Worse, in some cases it can induce a bias in the obtained data, i.e. the properties of the obtained maps are significantly different from those of the real topology. In order to reduce this bias, the general approach consists in increasing the number of sources. Some works have studied the relevance of this approach. Most of them have used theoretical results, or simulations on network models. Some papers have used real data obtained from actual measurement procedures to evaluate the importance of the number of sources and destinations, but no work to our knowledge has studied extensively the importance of the choice of sources or destinations. Here, we use real data from internet topology measurements to study this question: by comparing partial measurements to our complete data, we can evaluate the impact of adding sources or destinations on the observed properties.

We show that the number of sources and destinations used plays a role in the observed properties, but that their choice, and not only their number, also has a strong influence on the observations. We then study common statistics used to describe the internet topology, and show that they behave differently: some can be trusted once the number of sources and destinations are not too small, while others are difficult to evaluate.

Alina Stoica, Zbigniew Smoreda, Christophe Prieur and Jean-Loup Guillaume
Proceedings of the Workshop on the Analysis of Mobile Phone Networks, satellite workshop to NetSci 2010
Age, Gender and Communication Networks

In this paper, we address some sociological  and topological issues associated with mobile phone communication. Based on a dataset of a few million users, we use customers' age and gender information  to study relation between these parameters and the average behavior of users in terms of number of calls, number of SMS and calls duration. We  also study the dataset from a networking point of view: we define different profiles based on the topological properties of the  personal network of each individual and study the relations between these profiles and the age of customers

Thomas Aynaud and Jean-Loup Guillaume
in Technique et Science Informatiques, vol. 30/2, pp. 137-154, 2011
Structure multi-échelle de grands graphes de terrain

Most complex networks can be divided into dense sub-graphs called communities. These communities may also be divided recursively producing a hierarchical structure of communities, summarized in a tree named dendrogram. In this article we analyze this structure extracted from several complex networks. First we study the shape of the tree and, in particular, communities imbrications. Then we show that an excessive decomposition of communities can result in meaningless communities. We propose a couple of approaches to solve this problem.

Assia Hamzaoui, Matthieu Latapy and Clémence Magnien
Proceedings of International Workshop on Dynamic Networks (WDN), in conjunction with WiOpt 2010
Detecting Events in the Dynamics of Ego-centered Measurements of the Internet Topology

Detecting events such as major routing changes or congestions in the dynamics of the internet topology is an important but challenging task. We explore here a top-down approach based on a notion of statistically significant events. It consists in identifying statistics which exhibit a homogeneous distribution with outliers, which correspond to events. We apply this approach to ego-centerd measurements of the internet topology (views obtained from a single monitor) and show that it succeeds in detecting meaningful events. Finally, we give some hints for the interpretation of such events in terms of network events.

Lamia Benamara and Clémence Magnien
Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.
Estimating properties in dynamic systems: the case of churn in P2P networks

In many systems, such as P2P systems, the dynamicity of participating elements, or churn, has a strong impact. As a consequence, many efforts have been made to characterize it, and in particular to capture the session length distribution. However in most cases, estimating it rigorously is difficult. One of the reasons is that, because the observation window is by definition finite, parts of the sessions that begin before the window and/or end after it are missed. This induces a bias. Although it tends to decrease when the observation window length increases, it is difficult to quantify its importance, or how fast it decreases.

Here, we introduce a general methodology that allows us to know if the observation window is long enough to characterize a given property. This methodology is not specific to one study case and may be applied to any property in a dynamic system. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We show that the measurement needs to last for at least one week in order to obtain representative results. We also show that our methodology allows us to precisely characterize the shape of the session length distribution.

Christophe Crespelle, Matthieu Latapy, Elie Rotenberg
Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.
Rigorous measurement of IP-level neighborhood of Internet core routers

Many contributions use the degree distribution of IP-level internet topology. However, current knowledge of this property relies on biased and erroneous measurements, and so it is subject to much debate. We introduce here a new approach, dedicated to the core of the internet, which avoids the issues raised by classical measurements. It is based on the measurement of IP-level neighborhood of internet core routers, for which we design and implement a rigorous method. It consists in sending traceroute probes from many monitors distributed in the internet towards a given target router and carefully selecting the relevant information in collected data. Using simulations, we provide strong evidence of the accuracy of our approach. We then conduct real-world measurements illustrating the practical effectiveness of our method. This constitutes a significant step towards reliable knowledge of the IP-level degree distribution of the core of the internet.

Massoud Seifi, Jean-loup Guillaume, Matthieu Latapy and Bénédicte Le Grand
Proceedings of the 8th Workshop on Visualization and Knowledge Extraction (EGC 2010)
Interactive multiscale visualization of huge graphs: application to a network of weblogs

De nombreux réseaux du monde réel peuvent être modélisés par des grands graphes. Réduire la complexité d'un graphe de manière à ce qu'il puisse être facilement interprété par l'oeil humain est une aide précieuse pour comprendre et analyser ce type de données. Nous proposons une méthodologie de visualisation interactive multi-échelle de grands graphes basée sur une classification des sommets qui nous permet de représenter ces graphes de manière lisible et interprétable. Nous appliquons notre méthodologie à un réseau de blogs francophones.

Clémence Magnien, Matthieu Latapy and Jean-Loup Guillaume
ACM Computing Surveys, 43 (3), 2011. Extended abstract published in LNCS, proceedings of the 8-th International Conference On Principles Of Distributed Systems OPODIS'04, 2004, Grenoble, France (title: Comparison of Failures and Attacks on Random and Scale-free Networks)
Impact of Random Failures and Attacks on Poisson and Power-Law Random Networks

It appeared recently that the underlying degree distribution of a network may play a crucial role concerning its robustness. Empiric and analytic results have been obtained, based on asymptotic and mean-field approximations. Previous work insisted on the fact that power-law degree distributions induce a high resilience to random failure but a high sensitivity to some attack strategies, while Poisson degree distributions are quite sensitive in both cases. Then, much work has been done to extend these results. We focus here on these basic results, with the aim of deepening significantly our understanding of their origin and their limitations. We review in detail previous contributions and we give full proofs in a unified framework, in which the approximations on which these results rely are well identified. We then add to these known results a set of new ones aimed at enlightening some important aspects. We also provide extensive and rigorous experiments which make it possible to evaluate the relevance of the analytic results. We reach the conclusion that, even if it is clear that the basic results of the field are true and important, they are in practice much less striking than generally thought. The differences between random failures and attacks are not so huge and can be explained with simple facts. Likewise, the differences in the behaviors induced by power-law and Poisson distributions are not as striking as often claimed.

Guillaume Valadon, Clémence Magnien and Ryuji Wakikawa
Mobile IPv6 Deployments: Graph-based Analysis and practical Guidelines

The Mobile IPv6 protocol is a major solution to supply mobility services on the Internet. Many networking vendors have already implemented it in their operating systems and equipments. Moreover, it was recently selected to provide permanent IP addresses to end users of WiMAX and 3GPP2. Mobile IPv6 relies on a specific router called the home agent that hides location changes of the mobile nodes from the rest of the Internet. To do so, the mobile nodes' traffic must flow through the home agent. This mandatory deviation produces longer paths and higher communication delays. In order to solve these problems, we describe a new approach to address deployments of Mobile IPv6 based on graph theory and could be applied to any operator's network. In particular, we use notions of centrality in graphs to quantify increases of communication distances induced by dogleg routing and identify relevant home agents locations. We evaluate this approach using real-world network topologies and show that the obtained Mobile IPv6 performance could be close to direct paths ones. The proposed algorithm is generic and can be used to achieve efficient deployments of Mobile IPv6 as well as Home Agent Migration: a new Mobile IPv6 architecture using several distributed home agents.

Fabien Tarissan, Matthieu Latapy and Christophe Prieur
In Proceedings of the IEEE International Workshop on Network Science For Communication Networks (NetSciCom'09)
Efficient Measurement of Complex Networks Using Link Queries

Complex networks are at the core of an intense research activity. However, in most cases, intricate and costly measurement procedures are needed to explore their structure. In some cases, these measurements rely on link queries: given two nodes, it is possible to test the existence of a link between them. These tests may be costly, and thus minimizing their number while maximizing the number of discovered links is a key issue. This is a challenging task, though, as initially no information is known on the network. This paper studies this problem: we observe that properties classically observed on real-world complex networks give hints for their efficient measurement; we derive simple principles and several measurement strategies based on this, and experimentally evaluate their efficiency on real-world cases. In order to do so, we introduce methods to evaluate the efficiency of strategies. We also explore the bias that different measurement strategies may induce.

Frédéric Aidouni, Matthieu Latapy and Clémence Magnien
Sixth International Workshop on Hot Topics in Peer-to-Peer Systems (Hot-P2P 2009), May 29, 2009, Rome, Italy
Ten weeks in the life of an eDonkey server

This paper presents a capture of the queries managed by an eDonkey server during almost 10 weeks, leading to the observation of almost 9 billion messages involving almost 90 million users and more than 275 million distinct files. Acquisition and management of such data raises several challenges, which we discuss as well as the solutions we developed. We obtain a very rich dataset, orders of magnitude larger than previously avalaible ones, which we provide for public use. We finally present basic analysis of the obtained data, which already gives evidence of non-trivial features.

Clémence Magnien, Frédéric Ouedraogo, Guillaume Valadon, Matthieu Latapy
Fourth International Conference on Internet Monitoring and Protection (ICIMP 2009), May 24-28, 2009, Venice, Italy
Fast dynamics in Internet topology: preliminary observations and explanations

By focusing on what can be observed by running traceroute-like measurements at a high frequency from a single monitor to a fixed destination set, we show that the observed view of the topology is constantly evolving at a pace much higher than expected. Repeated measurements discover new IP addresses at a constant rate, for long period of times (up to several months). In order to provide explanations, we study this phenomenon both at the IP, and at the Autonomous System levels. We show that this renewal of IP addresses is partially caused by a BGP routing dynamics, altering paths between existing ASes. Furthermore, we conjecture that an intra AS routing dynamics is another cause of this phenomenon.

Oussama Allali, Matthieu Latapy and Clémence Magnien
Sixth International Workshop on Hot Topics in Peer-to-Peer Systems (Hot-P2P 2009), May 29, 2009, Rome, Italy
Measurement of eDonkey Activity with Distributed Honeypots

Collecting information about user activity in peer-to-peer systems is a key but challenging task. We describe here a distributed platform for doing so on the eDonkey network, relying on a group of honeypot peers which claim to have certain files and log queries they receive for these files. We then conduct some measurements with typical scenarios and use the obtained data to analyze the impact of key parameters like measurement duration, number of honeypots involved, and number of advertised files. This illustrates both the possible uses of our measurement system, and the kind of data one may collect using it.

Bénédicte Le Grand, Marie-Aude Aufaure et Michel Soto
Conférence EGC 2009 (Extraction et Gestion des Connaissances), Strasbourg, France, 27-30 janvier 2009
Empreintes conceptuelles et spatiales pour la caractérisation des réseaux sociaux

In this paper, Formal Concept Analysis and Galois lattices are used for the analysis of complex datasets, online social networks in particular. Lattice-inspired statistics computed on the objects of the lattice provide their "conceptual distribution". An experimentation conducted on four social networks' samples shows how these statistics may be used to characterize these networks and filter them automatically.

Matthieu Latapy
Theoretical Computer Science (TCS) 407 (1-3), pages 458-473, 2008
Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs

Finding, counting and/or listing triangles (three vertices with three edges) in massive graphs are natural fundamental problems, which received recently much attention because of their importance in complex network analysis. We provide here a detailed survey of proposed main-memory solutions to these problems, in an unified way. We note that previous authors paid surprisingly little attention to space complexity of main-memory solutions, despite its both fundamental and practical interest. We therefore detail space complexities of known algorithms and discuss their implications. We also present new algorithms which are time optimal for triangle listing and beats previous algorithms concerning space needs. They have the additional advantage of performing better on power-law graphs, which we also detail. We finally show with an experimental study that these two algorithms perform very well in practice, allowing to handle cases which were previously out of reach.

Matthieu Latapy, Clémence Magnien and Frédéric Ouédraogo
Proceedings of ADN'08: 1st International Workshop on Analysis of Dynamic Networks, in conjunction with IEEE ICDM 2008
A Radar for the Internet

In contrast with most internet topology measurement research, our concern here is not to obtain a map as complete and precise as possible of the whole internet. Instead, we claim that each machine's view of this topology, which we call ego-centered view, is an object worth of study in itself. We design and implement an ego-centered measurement tool, and perform radar-like measurements consisting of repeated measurements of such views of the internet topology. We conduct long-term (several weeks) and high-speed (one round every few minutes) measurements of this kind from more than one hundred monitors, and we provide the obtained data. We also show that these data may be used to detect events in the dynamics of internet topology.

Clémence Magnien, Matthieu Latapy and Michel Habib
ACM Journal of Experimental Algorithmics (JEA), 13, 2009
Fast Computation of Empirically Tight Bounds for the Diameter of Massive Graphs

The diameter of a graph is among its most basic parameters. Since a few years, it moreover became a key issue to compute it for massive graphs in the context of complex network analysis. However, known algorithms, including the ones producing approximate values, have too high a time and/or space complexity to be used in such cases. We propose here a new approach relying on very simple and fast algorithms that compute (upper and lower) bounds for the diameter. We show empirically that, on various real-world cases representative of complex networks studied in the literature, the obtained bounds are very tight (and even equal in some cases). This leads to rigorous and very accurate estimations of the actual diameter in cases which were previously untractable in practice.

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre
J. Stat. Mech. (october 2008) P10008
Fast unfolding of communities in large networks

We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks

Pierre Borgnat, Eric Fleury, Jean-Loup Guillaume, Céline Robardet and Antoine Scherrer
Computer Networks 52 (2008), pp. 2842-2858
Description and simulation of dynamic mobility networks

During the last decade, the study of large scale complex networks has attracted a substantial amount of attention and works from several domains: sociology, biology, computer science, epidemiology. Most of such complex networks are inherently dynamic, with new vertices and links appearing while some old ones disappear. Until recently, the dynamic of these networks was less studied and there is a strong need for dynamic network models in order to sustain protocol performance evaluations and fundamental analyzes in all the research domains listed above. We propose in this paper a novel framework for the study of dynamic mobility networks. We address the characterization of dynamics by proposing an in-depth description and analysis of two real-world data sets. We show in particular that links creation and deletion processes are independent of other graph properties and that such networks exhibit a large number of possible configurations, from sparse to dense. From those observations, we propose simple yet very accurate models that allow to generate random mobility graphs with similar temporal behavior as the one observed in experimental data.

Pierre Borgnat, Eric Fleury, Jean-Loup Guillaume, Clémence Magnien, Céline Robardet et Antoine Scherrer
Proceedings of NATO Advanced Study Institute on Mining Massive Data Sets for Security, IOS Press, 2008
Evolving networks

Most real networks often evolve through time: changes of topology can occur if some nodes and/or edges appear and/or disappear, and the types or weights of nodes and edges can also change even if the topology stays static. Mobile devices with wireless capabilities (mobile phones, laptops, etc.) are a typical example of evolving networks where nodes or users are spread in the environment and connections between users can only occur if they are near each other. This whois- near-whom network evolves every time users move and communication services (such as the spread of any information) will deeply rely on the mobility and on the characteristics of the underlying network. This paper presents some recent results concerning the characterization of the dynamics of complex networks through three different angles: evolution of some parameters on snapshots of the network, parameters describing the evolution itself, and intermediate approaches consisting in the study of specific phenomena or users of interest through time.

Matthieu Latapy, Clémence Magnien and Nathalie Del Vecchio
Social Networks (2008), vol. 30, no1, pp. 31-48
Basic Notions for the Analysis of Large Two-mode Networks

Many large real-world networks actually have a 2-mode nature: their nodes may be separated into two classes, the links being between nodes of different classes only. Despite this, and despite the fact that many ad-hoc tools have been designed for the study of special cases, very few exist to analyse (describe, extract relevant information) such networks in a systematic way. We propose here an extension of the most basic notions used nowadays to analyse large 1-mode networks (the classical case) to the 2-mode case. To achieve this, we introduce a set of simple statistics, which we discuss by comparing their values on a representative set of real-world networks and on their random versions. This makes it possible to evaluate their relevance in capturing properties of interest in 2-mode networks.

Fabien Viger, Brice Augustin, Xavier Cuvellier, Clémence Magnien, Matthieu Latapy, Timur Friedman, and Renata Teixeira
Computer Networks 52-5 (2008), pp. 998-1018. Extended abstract published in the proceedings of the 6-th Internet Measurement Conference IMC'06, 2006, Rio de Janeiro, Brazil
Detection, Understanding, and Prevention of Traceroute Measurement Artifacts

Traceroute is widely used: from the diagnosis of network problems to the assemblage of internet maps. Unfortunately, there are a number of problems with traceroute methodology, which lead to the inference of erroneous routes. This paper studies particular structures arising in nearly all traceroute measurements. We characterize them as "loops", "cycles", and "diamonds". We identify load balancing as a possible cause for the appearance of false loops, cycles and diamonds, i.e., artifacts that do not represent the internet topology. We provide a new publicly-available traceroute, called Paris traceroute, which, by controlling the packet header contents, provides a truer picture of the actual routes that packets follow. We performed measurements, from the perspective of a single source tracing towards multiple destinations, and Paris traceroute allowed us to show that many of the particular structures we observe are indeed traceroute measurement artifacts.

Matthieu Latapy and Clémence Magnien
Infocom'08 Proceedings, Phoenix, USA
Complex Network Measurements: Estimating the Relevance of Observed Properties

Complex networks, modeled as large graphs, received much attention during these last years. However, data on such networks is only available through intricate measurement procedures. Until recently, most studies assumed that these procedures eventually lead to samples large enough to be representative of the whole, at least concerning some key properties. This has crucial impact on network modeling and simulation, which rely on these properties. Recent contributions proved that this approach may be misleading, but no solution has been proposed. We provide here the first practical way to distinguish between cases where it is indeed misleading, and cases where the observed properties may be trusted. It consists in studying how the properties of interest evolve when the sample grows, and in particular whether they reach a steady state or not. In order to illustrate this method and to demonstrate its relevance, we apply it to data-sets on complex network measurements that are representative of the ones commonly used. The obtained results show that the method fulfills its goals very well. We moreover identify some properties which seem easier to evaluate in practice, thus opening interesting perspectives.

Vincent D. Blondel, Jean-Loup Guillaume, Julien M. Hendrickx, Cristobald de Kerchove and Renaud Lambiotte
Phys. Rev. E 77, 036114 (2008)
Local leaders in random networks

We consider local leaders in random uncorrelated networks, i.e., nodes whose degree is higher than or equal to the degree of all their neighbors. An analytical expression is found for the probability for a node of degree k to be a local leader. This quantity is shown to exhibit a transition from a situation where high-degree nodes are local leaders to a situation where they are not, when the tail of the degree distribution behaves like the power law ~k^gamma_c with gamma_c=3. Theoretical results are verified by computer simulations, and the importance of finite-size effects is discussed.

Vincent D. Blondel, Jean-Loup Guillaume, Julien M. Hendrickx, and Raphaël M. Jungers
Phys. Rev. E 76, 066101 (2007)
Distance distribution in random graphs and application to network exploration

We consider the problem of determining the proportion of edges that are discovered in an Erdős–Rényi graph when one constructs all shortest paths from a given source node to all other nodes. This problem is equivalent to the one of determining the proportion of edges connecting nodes that are at identical distance from the source node. The evolution of this quantity with the probability of existence of the edges exhibits intriguing oscillatory behavior. In order to perform our analysis, we introduce a different way of computing the distribution of distances between nodes. Our method outperforms previous similar analyses and leads to estimates that coincide remarkably well with numerical simulations. It allows us to characterize the phase transitions appearing when the connectivity probability varies.

Rémi Dorat, Matthieu Latapy, Bernard Conein and Nicolas Auray
Annals of Telecommunications (2007), vol. 62, no3-4, pp. 325-349
Multi-level analysis of an interaction network between individuals in a mailing-list

It is well known now that most real-world complex networks have some properties which make them very different from random networks. In the case of interactions between authors of messages in a mailing-list, however, a multi-level structure may be responsible for some of these properties. We propose here a rigorous but simple formalism to investigate this question, and we apply it to an archive of the Debian user mailing-list. This leads to the identification of some properties which may indeed be explained this way, and of some properties which need deeper analysis.

Jérémie Leguay, Matthieu Latapy, Timur Friedman, Kavé Salamatian
Computer Networks 51, pages 2067-2087, 2007. Extended abstract published in LNCS, proceedings of the 4-th IFIP international conference on Networking, 2005, Waterloo, Canada
Describing and simulating routes on the Internet

This contribution deals with actual routes followed by packets on the internet at IP level. We first propose a set of statistical properties to analyse such routes, which brings detailed information on them. We then use the obtained results to suggest and evaluate methods for generating artificial routes suitable for simulation purposes. This also makes it possible to evaluate various network models. This work is based on large data sets provided mainly by CAIDA's skitter infrastructure.

Ravi Kumar and Matthieu Latapy
Theoretical Computer Science, special issue on Complex Networks, Volume 355, Number 1, 6 April 2006
Theoretical Computer Science special issue on Complex Networks – foreword

Jean-Loup Guillaume and Matthieu Latapy
Physica A 371, pages 795-813, 2006. Extended abstract published in LNCS, proceedings of the 1-st international workshop on Combinatorial and Algorithmic Aspects of Networking CAAN'04, 2004, Banff, Canada
Bipartite graphs as Models of Complex Networks

It appeared recently that the classical random graph model used to represent real-world complex networks does not capture their main properties. Since then, various attempts have been made to provide accurate models. We study here the first model which achieves the following challenges: it produces graphs which have the three main wanted properties (clustering, degree distribution, average distance), it is based on some real-world observations, and it is sufficiently simple to make it possible to prove its main properties. This model consists in sampling a random bipartite graph with prescribed degree distribution. Indeed, we show that any complex network can be viewed as a bipartite graph with some specific characteristics, and that its main properties can be viewed as consequences of this underlying structure. We also propose a growing model based on this observation.

Anne-Ruxandra Carvunis, Matthieu Latapy, Annick Lesne, Clémence Magnien and Laurent Pezard
Physica A 367, pages 585-612, 2006
Dynamics of three-state excitable units on Poisson vs power-law random networks

The influence of the network topology on the dynamics of systems of coupled excitable units is studied numerically and demonstrates a lower dynamical variability for power-law networks than for Poisson ones. This effect which reflects a robust collective excitable behavior is however lower than that observed for diffusion processes or network robustness. Instead, the presence (and number) of triangles and larger loops in the networks appears as a parameter with strong influence on the considered dynamics.

Pascal Pons and Matthieu Latapy
Journal of Graph Algorithms and Applications (JGAA) vol. 10, no. 2, pages 191-218, 2006. Extended abstract published in LNCS, proceedings of the 20-th International Symposium on Computer and Information Sciences ISCIS'05, 2005, Istambul, Turquie
Computing communities in large networks using random walks

Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn²) and space O(n²) in the worst case, and in time O(n² log n) and space O(n²) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.

Stevens Le Blond, Jean-Loup Guillaume and Matthieu Latapy
LNCS, proceedings of the 4-th International workshop on Peer-to-Peer Systems IPTPS'05, 2005, Ithaka, New York, USA
Clustering in P2P exchanges and consequences on performances

We propose here an analysis of a rich dataset which gives an exhaustive and dynamic view of the exchanges processed in a running eDonkey system. We focus on correlation in term of data exchanged by peers having provided or queried at least one data in common. We introduce a method to capture these correlations (namely the data clustering), and study it in detail. We then use it to propose a very simple and efficient way to group data into clusters and show the impact of this underlying structure on search in typical P2P systems. Finally, we use these results to evaluate the relevance and limitations of a model proposed in a previous publication. We indicate some realistic values for the parameters of this model, and discuss some possible improvements.

Pierre Fraigniaud, Philippe Gauron and Matthieu Latapy
LNCS, proceedings of the 11-th international conference Euro-Par, 2005, Lisbonne, Portugal
Combining the use of clustering and scale-free nature of user exchanges into a simple and efficient P2P system

It appeared recently that user interests in a P2P system possess clustering properties that may be used to reduce significantly the amount of traffic of flooding-based search strategies. It was also observed that they possess scale-free properties that may be used for the design of efficient routing-based search strategies. In this paper, we show that the combination of these two properties make it possible to design an efficient and simple fully decentralized search strategy. Further, simulations processed on real-world traces show that other unidentified properties hidden in actual queries make our protocol even more efficient, performing searches in logarithmic expected number of steps.

Jean-Loup Guillaume and Matthieu Latapy
Complex Systems 16, pages 83-94, 2005
Complex Network Metrology

In order to study some complex networks like the Internet, the Web, social networks or biological networks, one first has to explore them. This gives a partial and biased view of the real object, which is generally assumed to be representative of the whole. However, up to now nobody knows how and how much the measure influences the results. Using the example of the Internet and a rough model of its exploration process, we show that the way a given complex network is explored may strongly influence the observed properties. This leads us to argue for the necessity of developing a science of metrology of complex networks. Its aim would be to study how the partial and biased view of a network relates to the properties of the whole network.

Jean-Loup Guillaume, Matthieu Latapy and Damien Magoni
Computer Networks 50, pages 3197-3224, 2006. Extended abstract published in the proceedings of the 24-th IEEE international conference Infocom'05, 2005, Miami, USA
Relevance of Massively Distributed Explorations of the Internet Topology: Qualitative Results

Internet maps are generally constructed using the traceroute tool from a few sources to many destinations. It appeared recently that this exploration process gives a partial and biased view of the real topology, which leads to the idea of increasing the number of sources to improve the quality of the maps. In this paper, we present a set of experiments we have conducted to evaluate the relevance of this approach. It appears that the statistical properties of the underlying network have a strong influence on the quality of the obtained maps, which can be improved using massively distributed explorations. Conversely, some statistical properties are very robust, and so the known values for the Internet may be considered as reliable. We validate our analysis using real-world data and experiments, and we discuss its implications.

Fabien Viger and Matthieu Latapy
Submitted. Extended abstract published in LNCS, proceedings of the 11-th international conference on Computing and Combinatorics COCOON'05, 2005, Kunming, Yunnan, Chine
Random generation of large connected simple graphs with prescribed degree distribution

We address here the problem of generating random graphs uniformly from the set of simple connected graphs having a prescribed degree sequence. Our goal is to provide an algorithm suitable for practical use both because of its ability to generate very large graphs (efficiency) and because it is easy to implement (simplicity). We focus on a family of heuristics for which we introduce optimality conditions, and show how this optimality can be reached in practice. We then propose a different approach, specifically designed for real-world degree distributions, which outperforms the first one. Based on a conjecture which we discuss rigorously and study empirically, we finally reduce the best asymptotic complexity bound known so far.

Jean-Loup Guillaume, Matthieu Latapy and Stevens Le-Blond
LNCS, proceedings of the 6-th International Workshop on Distributed Computing IWDC'04, 2004, Kolkata, India
Statistical analysis of a P2P query graph based on degrees and their time-evolution

Despite their crucial impact on the performances of P2P systems, very few is known on peers behaviors in such networks. We propose here a study of these behaviors in a running environment using a semicentralised P2P system (eDonkey). To achieve this, we use a trace of the queries made to a large server managing up to fifty thousands peers simultaneously, and a few thousands queries per second. We analyse these data using complex network methods, and focus in particular on the degrees, their correlations, and their time-evolution. Results show a large variety of observed phenomena, including the variety of peers behaviors and heterogeneity of data queries, which should be taken into account when designing P2P systems.

Jean-Loup Guillaume and Matthieu Latapy
Information Processing Letters (IPL) 90:5, pages 215-221, 2004
Bipartite Structure of all Complex Networks

The analysis and modelling of various complex networks has received much attention in the last few years. Some such networks display a natural bipartite structure: two kinds of nodes coexist with links only between nodes of different kinds. This bipartite structure has not been deeply studied until now, mainly because it appeared to be specific to only a few complex networks. However, we show here that all complex networks can be viewed as bipartite structures sharing some important statistics, like degree distributions. The basic properties of complex networks can be viewed as consequences of this underlying bipartite structure. This leads us to propose the first simple and intuitive model for complex networks which captures the main properties met in practice.

Jean-Loup Guillaume, Matthieu Latapy and Laurent Viennot
LNCS, proceedings of the 3-rd international conference Web-Age Information Management WAIM'02, 2002, Beijing, Chine. Abstract published in the proceedings of the 11-th international conference World Wide Web WWW'02, 2002, Honolulu, Hawaï
Efficient and Simple Encodings for the Web Graph

In this paper, we propose a set of simple and efficient methods based on standard, free and widely available tools, to store and manipulate large sets of URLs and large parts of the Web graph. Our aim is both to store efficiently the URLs list and the graph in order to manage all the computations in a computer central memory. We also want to make the conversion between URLs and their identifiers as fast as possible, and to obtain all the successors of an URL in the Web graph efficiently. The methods we propose make it possible to obtain a good compromise between these two challenges, and make it possible to manipulate large parts of the Web graph.