Mapping low-resolution edges to high-resolution paths: the case of traffic measurements in cities

Bastien Legay, Matthieu Latapy

In: Botta, F., Macedo, M., Barbosa, H., Menezes, R. (eds) Complex Networks XV. CompleNet-Live 2024. Springer Proceedings in Complexity. Springer, Cham.

https://doi.org/10.1007/978-3-031-57515-0_1

We consider the following problem: we have a high-resolution street network of a given city, and low-resolution measurements of traffic within this city. We want to associate to each measurement the set of streets corresponding to the observed traffic. To do so, we take benefit of specific properties of these data to match measured links to links in the street network. We propose several success criteria for the obtained matching. They show that the matching algorithm generally performs very well, and they give complementary ways to detect data discrepancies that makes any matching highly dubious.

Download

Visite guidée de la distillerie de Scotch

François Pellegrini (LaBRI, Bordeaux)

jeudi 20 juin à 11h en salle 24-25/405, LIP6, Sorbonne Université

Le partitionnement de graphes est un problème très courant qui a de nombreuses applications dans le domaine de  l’informatique scientifique. Du fait de la taille croissante des problèmes à résoudre, de nombreuses mises en œuvre  parallèles d’algorithmes de partitionnement de graphes ont été proposées dans la littérature, que ce soit pour des multiprocesseurs à mémoire partagée ou des multi-ordinateurs à mémoire distribuée. Cet exposé présentera les  principales structures de données et les algorithmes mis en œuvre au sein des bibliothèques libScotch et  libPTScotch. Il se concentrera principalement sur les types d’algorithmes disponibles, plutôt que sur leurs détails  d’implémentation. Il abordera néanmoins quelques questions opérationnelles importantes, concernant la  reproductibilité et le multi-tâches.

BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs

Alexis Baudin, Clémence Magnien, Lionel Tabourier

preprint arXiv:2405.04428

Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.

Download

De l’écosystème au graphe dynamique ou comment représenter la nature dans sa complexité.

Jacques Gignoux (iEES-Paris)

mardi 30 avril 2024 à 10h30 en salle 25-26/105, LIP6, Sorbonne Université

Transparents

L’écologie prétend étudier toutes les formes d’organisation du monde vivant, de la bactérie à la biosphère. Elle utilise pour cela le concept d’écosystème qui malgré sa simplicité autorise des représentations (graphiques, mathématiques, informatiques) efficaces très variées du monde qui nous entoure. Comme beaucoup de systèmes vivants, les écosystèmes sont qualifiés de systèmes complexes, susceptibles de présenter des propriétés émergentes comme la stabilité ou la résilience aux perturbations. Malheureusement, les définitions précise de l’émergence font défaut… ou sont beaucoup trop nombreuses pour être utilisables. Je montre qu’en se donnant une définition formelle d’un système possiblement complexe sous la forme d’un graphe dynamique, on peut arriver à la définition précise de 4 types d’émergence et des conditions dans lesquelles elles se manifestent. Sur cette base, on peut construire un simulateur généraliste applicable à l’écosystème et analysable en tant que système complexe avec des outils algorithmiques encore à définir, dont je proposerai un exemple. L’enjeu de ce travail est de fournir un cadre conceptuel permettant la comparaison des représentations/modèles d’écosystème et l’analyse de leurs propriétés émergentes.

Modèle de graphe hiérarchique pour la représentation et
l’analyse de la mobilité et du réseau maritime

Cyril Ray (École Navale / Arts et Métiers)

jeudi 28 mars 2024 à 14h en salle 25-26/105, LIP6, Sorbonne Université

La crise sanitaire et plus récemment la situation géopolitique internationale que nous traversons nous ont rappelé à quel point nos économies modernes étaient tributaires du transport international de marchandises en général et de la maritimisation des échanges internationaux en particulier (puisque 90% de ce transport s’effectuent par voie maritime). Le transport maritime est donc au coeur de nos économies globalisées. Désormais, à l’aide de nombreux capteurs, un large panel de données maritimes est collecté en continu, archivé, et exploité pour la réalisation de nombreuses applications (suivi des pêches, sécurisation de la navigation, planification de routes optimales, contrôle du respect des règles internationales, protection de la biodiversité…). Les bénéfices de cette numérisation de l’espace et de l’information maritime sont multiples. Elle offre de nombreuses opportunités pour appréhender, analyser, prédire les échanges maritimes par l’analyse des données. Durant cette présentation nous aborderons la conception d’un modèle de graphe de hiérarchique pour représenter les mobilités et le réseau de transport maritime. Le modèle est construit par agrégation de trajectoires sémantiques, elles-mêmes émergentes des données de localisation de navires. Le modèle de graphe se concentre sur les origines, destinations et points saillants des mobilités. Un lien entre les données géographiques et les nœuds du graphe est réalisé par indexation hexagonale hiérarchique. La présentation abordera également l’algorithme d’agrégation hiérarchique et les opérateurs de centralité et mesure d’évolution de la dynamique du graphe.

Road network structure and traffic patterns

Erwan Taillanter

mardi 14 mars 2024 à 10h30 en salle 24-25/405, LIP6, Sorbonne Université

Le trafic routier est un domaine habituellement associé aux sciences de l’ingénieur. Si des modèles physiques, basés sur des concepts d’hydrodynamique, ont été appliqué avec succès pour décrire le trafic sur des routes ou autoroutes, la question du trafic en milieu urbain, autrement plus complexe, est actuellement traitée principalement par le biais de simulations. Ces simulations s’avèrent malheureusement souvent imparfaites. Ceci provient du caractère chaotique du trafic routier, exacerbé sur un réseau urbain aux multiples facteurs exogènes (feux de circulation en tête), et où les intersections introduisent une forte corrélation entre les rues. Par conséquent, une simulation agent-basée, bien que réaliste à petite échelle, ne saurait décrire de façon satisfaisante l’évolution du système aux plus grandes échelles. Cette situation ressemble fortement à d’autres problèmes rencontrés dans le domaine de la physique, et en particulier dans le domaine des systèmes complexes. L’objet de ma thèse a donc été l’application d’outils issus de la physique statistique à la description du trafic routier en ville. La démarche générale est toujours de proposer des modèles « macroscopiques », ignorant la réalité individuelle des automobilistes pour se concentrer sur des grandeurs définies à l’échelle des rues ou du réseau entier. Cette discussion aura pour objectif de présenter deux de ces approches macroscopiques. D’une part, je présenterai un concept central en sciences du trafic urbain modernes, nommé Diagramme Fondamental Macroscopique (MFD). D’autre part, je présenterai des résultats suggérant que le trafic urbain se comporte comme un système physique subissant une transition de phase. Enfin, j’élargirai la discussion, en présentant des travaux tentant de combiner ces deux points de vues dans un point de vue cohérent.

Probabilistic k-swap method for uniform graph generation beyond the configuration model

Lionel Tabourier, Julien Karadayi

In Journal of Complex Networks, 2024, 12 (1)

DOI: 10.1093/comnet/cnae002

Generating graphs with realistic structural characteristics is an important challenge for complex networks analysis, as these graphs are the null models that allow to describe and understand the properties of real-world networks. However, the field lacks systematic Generating graphs with realistic structural characteristics is an important challenge for complex networks analysis, as these graphs are the null models that allow to describe and understand the properties of real-world networks. However, the field lacks systematic means to generate samples of graphs with predefined structural properties, because it is difficult to devise a method that is both flexible and guarantees to get a uniform sample, i.e., where any graph of the target set has the same probability to be represented in the sample. In practice, it limits the experimental investigation to a handful of models, including the well-known Erdős-Rényi graphs or the configuration model. The aim of this paper is to provide such a method: we design and implement a Monte-Carlo Markov Chain process which is both flexible and satisfies the uniformity condition. Its assumptions are that: 1) the graphs are simple, 2) their degree sequence is fixed, 3) the user has at least one graph of the set available. Within these limitations, we prove that it is possible to generate a uniform sample of any set of such graphs. We provide an implementation in python and extensive experiments to show that this method is practically operational in several relevant cases. We use it with five specific set of constraints and verify that the samples obtained are consistent with existing methods when such a method is available. In those cases, we report that state-of-the-art methods are usually faster, as our method favors versatility at the cost of a lower efficiency. Also, the implementation provided has been designed so that users may adapt it to relevant constraints for their own field of work.

Download

Approches hybrides de détection d’anomalies dans les transactions financières.

Blaise Ngonmang

jeudi 11 janvier 2024 à 14h en salle 24-25/509, LIP6, Sorbonne Université

La fraude constitue un défi omniprésent dans divers secteurs tels que les services financiers et publics, engendrant d’importantes pertes financières pour les entreprises et institutions concernées. La prévention de la fraude représente ainsi un enjeu crucial. Cependant, la simple détection de la fraude ne suffit pas à atténuer ses conséquences; il est impératif de pouvoir la prouver opérationnellement. Ce séminaire explore des techniques de détection de fraudes basées sur des approches hybrides combinant l’apprentissage automatique et l’analyse de graphes. De plus, nous présentons des stratégies visant à faciliter l’interprétation des fraudes détectées.

Representing Edge Flows on Graphs via Sparse Cell Complexes

Josef Hoppe

vendredi 27 novembre 2023 à 11h en salle 24-25/405, LIP6, Sorbonne Université

Slides, codes, presentation at LoG

Obtaining sparse, interpretable representations of observable data is crucial in many machine learning and signal processing tasks. For data representing flows along the edges of a graph, an intuitively interpretable way to obtain such representations is to lift the graph structure to a simplicial complex: The eigenvectors of the associated Hodge-Laplacian, respectively the incidence matrices of the corresponding simplicial complex then induce a Hodge decomposition, which can be used to represent the observed data in terms of gradient, curl, and harmonic flows. In this paper, we generalize this approach to cellular complexes and introduce the flow representation learning problem, i.e., the problem of augmenting the observed graph by a set of cells, such that the eigenvectors of the associated Hodge Laplacian provide a sparse, interpretable representation of the observed edge flows on the graph. We show that this problem is NP-hard and introduce an efficient approximation algorithm for its solution. Experiments on real-world and synthetic data demonstrate that our algorithm outperforms state-of-the-art methods with respect to approximation error, while being computationally efficient.

Link Streams as a Generalization of Graphs and Time Series

Esteban Bautista, Matthieu Latapy

In « Temporal Network Theory », Holme, P. and Saramäki, J. (eds), Springer, 2023

DOI: 10.1007/978-3-031-30399-9_22œ

A link stream is a set of possibly weighted triplets (t, u, v) modeling that u and v interacted at time t. Link streams offer an effective model for datasets containing both temporal and relational information, making their proper analysis crucial in many applications. They are commonly regarded as sequences of graphs or collections of time series. Yet, a recent seminal work demonstrated that link streams are more general objects of which graphs are only particular cases. It therefore started the construction of a dedicated formalism for link streams by extending graph theory. In this work, we contribute to the development of this formalism by showing that link streams also generalize time series. In particular, we show that a link stream corresponds to a time-series extended to a relational dimension, which opens the door to also extend the framework of signal processing to link streams. We therefore develop extensions of numerous signal concepts to link streams: from elementary ones like energy, correlation, and differentiation, to more advanced ones like Fourier transform and filters.

Download

Computing Betweenness Centrality in Link Stream

Frédéric Simard, Clémence Magnien, Matthieu Latapy

Journal of Graph Algorithms and Applications 27:3, 2023 DOI: 10.7155/jgaa.00620

Betweenness centrality is one of the most important concepts in graph analysis. It was recently extended to link streams, a graph generalization where links arrive over time. However, its computation raises non-trivial issues, due in particular to the fact that time is considered as continuous. We provide here the first algorithms to compute this generalized betweenness centrality, as well as several companion algorithms that have their own interest. They work in polynomial time and space, we illustrate them on typical examples, and we provide an implementation.

Download

Code available here

A Frequency-Structure Approach for Link Stream Analysis

Esteban Bautista, Matthieu Latapy

Fifth IEEE International Conference on Cognitive Machine Intelligence (CogMI), 2023

A link stream is a set of triplets (t,u,v) indicating that u and v interacted at time t. Link streams model numerous datasets and their proper study is crucial in many applications. In practice, raw link streams are often aggregated or transformed into time series or graphs where decisions are made. Yet, it remains unclear how the dynamical and structural information of a raw link stream carries into the transformed object. This work shows that it is possible to shed light into this question by studying link streams via algebraically linear graph and signal operators, for which we introduce a novel linear matrix framework for the analysis of link streams. We show that, due to their linearity, most methods in signal processing can be easily adopted by our framework to analyze the time/frequency information of link streams. However, the availability of linear graph methods to analyze relational/structural information is limited. We address this limitation by developing (i) a new basis for graphs that allow us to decompose them into structures at different resolution levels; and (ii) filters for graphs that allow us to change their structural information in a controlled manner. By plugging-in these developments and their time-domain counterpart into our framework, we are able to (i) obtain a new basis for link streams that allow us to represent them in a frequency-structure domain; and (ii) show that many interesting transformations to link streams, like the aggregation of interactions or their embedding into a euclidean space, can be seen as simple filters in our frequency-structure domain.

Download

How to assess and optimize the energy efficiency of microservices placement

Imane Taleb

vendredi 17 novembre 2023 à 14h en salle 26-00/124, LIP6, Sorbonne Université

Microservices are small, independent and scalable services used to build applications, offering flexibility and high-quality service. However, this model presents challenges in terms of network congestion, microservice placement, resource management and energy consumption. Based on an analysis revealing a lack of research on energy optimisation, this thesis focuses on assessing the energy efficiency of microservice placement, using graph partitioning techniques to optimise microservice placement across network architecture layers (Cloud, Fog, Edge).

LSCPM: Communities in Massive Real-World Link Streams by Clique Percolation Method

Alexis Baudin, Lionel Tabourier, Clémence Magnien

30th International Symposium on Temporal Representation and Reasoning (TIME 2023)

Community detection is a popular approach to understand the organization of interactions in static networks. For that purpose, the Clique Percolation Method (CPM), which involves the percolation of k-cliques, is a well-studied technique that offers several advantages. Besides, studying interactions that occur over time is useful in various contexts, which can be modeled by the link stream formalism. The Dynamic Clique Percolation Method (DCPM) has been proposed for extending CPM to temporal networks.
However, existing implementations are unable to handle massive datasets. We present a novel algorithm that adapts CPM to link streams, which has the advantage that it allows us to speed up the computation time with respect to the existing DCPM method. We evaluate it experimentally on real datasets and show that it scales to massive link streams. For example, it allows to obtain a complete set of communities in under twenty-five minutes for a dataset with thirty million links, what the state of the art fails to achieve even after a week of computation. We further show that our method provides communities similar to DCPM, but slightly more aggregated. We exhibit the relevance of the obtained communities in real world cases, and show that they provide information on the importance of vertices in the link streams.

Download

Monotonicity on undirected networks

Sebastiano Vigna

mercredi 24 mai 2023 à 11h en salle 24-25/405, LIP6, Sorbonne Université

Slides

Is it always beneficial to create a new relationship (have a new follower/friend) in a social network? This question can be formally stated as a property of the centrality measure that defines the importance of the actors of the network. Score monotonicity means that adding an arc increases the centrality score of the target of the arc; rank monotonicity means that adding an arc improves the importance of the target of the arc relatively to the remaining nodes. It is known that most centralities are both score and rank monotone on directed, strongly connected graphs. In this paper, we study the problem of score and rank monotonicity for classical centrality measures in the case of undirected networks: in this case, we require that score, or relative importance, improve at both endpoints of the new edge. We show that, surprisingly, the situation in the undirected case is very different, and in particular that closeness, harmonic centrality, betweenness, eigenvector centrality, Seeley’s index, Katz’s index, and PageRank are not rank monotone; betweenness and PageRank are not even score monotone. In other words, while it is always a good thing to get a new follower, it is not always beneficial to get a new friend.

Continuous Average Straightness in Spatial Graphs

Vincent Labatut

vendredi 12 mai 2023 à 11h en salle 26-00/228 LIP6, Sorbonne Université

Slides

Straightness is a measure designed to characterize a pair of vertices in a spatial graph. In practice, it is often averaged over the whole graph, or a part of it. The standard approach consists in: 1) discretizing the graph edges, 2) processing the vertex-to-vertex Straightness considering the additional vertices resulting from this discretization, and 3) averaging the obtained values. However, this discrete approximation can be computationally expensive on large graphs, and its precision has not been clearly assessed. In this work, we adopt a continuous approach to average the Straightness over the edges of spatial graphs. This allows us to derive 5 distinct measures able to characterize precisely the accessibility of the whole graph, as well as individual vertices and edges. Our method is generic and could be applied to other measures designed for spatial graphs. We perform an experimental evaluation of our continuous average Straightness measures, and show how they behave differently from the traditional vertex-to-vertex ones. Moreover, we also study their discrete approximations, and show that our approach is globally less demanding in terms of both processing time and memory usage.

Penser l’ingénierie et ses outils informatiques dans des contextes de soutenabilité forte : le cas de l’analyse environnementale en conception

Lou Grimal

mardi 9 mai 2023 à 11h en salle 26-00/428 LIP6, Sorbonne Université

Une transition des systèmes socio-techniques semble être nécessaire afin de limiter le dépassement des limites planétaires. L’ingénierie, activité influencée par le contexte historique et épistémologique de son époque, peut être un levier pour cette transition. Ma thèse a permis de proposer un cadre théorique pour comprendre comment l’ingénierie peut exister dans des contextes de soutenabilité forte et explorer des outils informatiques qui peuvent être déployés dans ces contextes. Ce cadre est composé de 4 caractéristiques (éthique, objectif, démarche, expertise) et adresse le niveau de l’ingénierie et le niveau des interactions entre l’ingénieur et ses outils informatiques. La méthodologie Value Sensitive Design a été mise en œuvre pour explorer ce cadre théorique. Deux expérimentations ont été conduites pour tester une première médiatisation de la permaingénierie dans un outil informatique – outil mettant en œuvre une démarche d’analyse environnementale collaborative. Ces expérimentations ont permis de montrer un besoin de cadre conceptuel pour une ingénierie en contexte de soutenabilité forte et un manque de pratique des acteurs exprimant ce besoin. Trois contributions ont été identifiées : (1) la formalisation du cadre de permaingénierie, (2) l’approche par les interactions humains-machines pour adresser les questions de transitions culturelles et changement de valeurs, (3) l’impossibilité de transformer un outil d’ingénierie de soutenabilité faible en outil de soutenabilité forte.

Hearing the shape of a robotic arena: Spectral shape analysis by Kilobots

Main author and presenter: Leo Cazenille
Other authors: Nicolas Lobato-Dauzier, Alessia Loi, Mika Ito, Olivier Marchal, Nathanael Aubert-Kato, Nicolas Bredeche, Anthony J. Genot

April 18th, 2023, 11am
Room: 24-25/405

Biological swarms have showcased their extraordinary capabilities in tackling geometric challenges by employing limited perception and mobility. They achieve this feat by internally diffusing information to bridge the gap between local and global scales, ultimately facilitating collective consensus and decision-making, even when individual agents only have access to local information. In this study, we strive to adapt this paradigm to robotic swarms, which consist of small robots with constrained sensing and computational abilities.

Our bio-inspired approach leverages spectral shape analysis, enabling the robotic swarms to identify the shape of a given arena. By estimating the second eigenvalue of the Laplacian collectively through information exchange, the robots can effectively obtain a fingerprint of the arena’s geometry. This metric, known as algebraic connectivity, proves invaluable in the context of swarm-based problem-solving and coordination.

To evaluate the performance of our proposed method, we conducted experiments involving 25 real robots as well as simulations using Kilombo, a state-of-the-art simulator for kilobots. Our objective was to assess the efficacy of our approach by attempting to classify a set of 8 shapes with varying geometric properties. The results of these experiments and simulations indicate that the diffusion-based spectral analysis can indeed empower robotic swarms to accurately sense and classify the geometry of their environment.

In conclusion, our innovative approach offers a promising avenue for advancing swarm-based problem-solving and coordination by drawing inspiration from the remarkable capabilities of biological swarms in addressing geometric challenges with limited perception and mobility.

(Presentation in French with slides in English).

Faster maximal clique enumeration in large real-world link streams

Alexis Baudin, Clémence Magnien, Lionel Tabourier

Journal of Graph Algorithms and Applications, Vol. 28 No. 1 (2024), p. 149-178

Link streams offer a good model for representing interactions over time. They consist of links (b,e,u,v), where u and v are vertices interacting during the whole time interval [b,e]. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair (C,[t0,t1]), where C is a set of vertices that all interact pairwise during the full interval [t0,t1]. It is maximal when neither its set of vertices nor its time interval can be increased. Some of the main works solving this problem are based on the famous Bron-Kerbosch algorithm for enumerating maximal cliques in graphs. We take this idea as a starting point to propose a new algorithm which matches the cliques of the instantaneous graphs formed by links existing at a given time t to the maximal cliques of the link stream. We prove its validity and compute its complexity, which is better than the state-of-the art ones in many cases of interest. We also study the output-sensitive complexity, which is close to the output size, thereby showing that our algorithm is efficient. To confirm this, we perform experiments on link streams used in the state of the art, and on massive link streams, up to 100 million links. In all cases our algorithm is faster, mostly by a factor of at least 10 and up to a factor of 10**4. Moreover, it scales to massive link streams for which the existing algorithms are not able to provide the solution.

Download

Énumération efficace des cliques maximales dans les flots de liens réels massifs

Alexis Baudin, Clémence Magnien, Lionel Tabourier

EGC 2023, vol. RNTI-E-39, pp.139-150

Les flots de liens offrent un formalisme de description d’interactions au cours du temps. Un lien correspond à deux sommets qui interagissent sur un intervalle de temps. Une clique est un ensemble de sommets associé à un intervalle de temps durant lequel ils sont tous connectés. Elle est maximale si ni son ensemble de sommets ni son intervalle de temps ne peuvent être augmentés. Les algorithmes existants pour énumérer ces structures ne permettent pas de traiter des jeux de données réels de plus de quelques centaines de milliers d’interactions. Or, l’accès à des données toujours plus massives demande d’adapter les outils à de plus grandes échelles. Nous proposons alors un algorithme qui énumère les cliques maximales sur des réseaux temporels réels et massifs atteignant jusqu’à plus de 100 millions de liens. Nous montrons expérimentalement qu’il améliore l’état de l’art de plusieurs ordres de grandeur.

Download