vendredi 27 novembre 2023 à 11h en salle 24-25/405, LIP6, Sorbonne Université
Slides, codes, presentation at LoG
Obtaining sparse, interpretable representations of observable data is crucial in many machine learning and signal processing tasks. For data representing flows along the edges of a graph, an intuitively interpretable way to obtain such representations is to lift the graph structure to a simplicial complex: The eigenvectors of the associated Hodge-Laplacian, respectively the incidence matrices of the corresponding simplicial complex then induce a Hodge decomposition, which can be used to represent the observed data in terms of gradient, curl, and harmonic flows. In this paper, we generalize this approach to cellular complexes and introduce the flow representation learning problem, i.e., the problem of augmenting the observed graph by a set of cells, such that the eigenvectors of the associated Hodge Laplacian provide a sparse, interpretable representation of the observed edge flows on the graph. We show that this problem is NP-hard and introduce an efficient approximation algorithm for its solution. Experiments on real-world and synthetic data demonstrate that our algorithm outperforms state-of-the-art methods with respect to approximation error, while being computationally efficient.
Esteban Bautista, Matthieu Latapy
In « Temporal Network Theory », Holme, P. and Saramäki, J. (eds), Springer, 2023
Betweenness centrality is one of the most important concepts in graph analysis. It was rA A link stream is a set of possibly weighted triplets (t, u, v) modeling that u and v interacted at A link stream is a set of possibly weighted triplets (t, u, v) modeling that u and v interacted at time t. Link streams offer an effective model for datasets containing both temporal and relational information, making their proper analysis crucial in many applications. They are commonly regarded as sequences of graphs or collections of time series. Yet, a recent seminal work demonstrated that link streams are more general objects of which graphs are only particular cases. It therefore started the construction of a dedicated formalism for link streams by extending graph theory. In this work, we contribute to the development of this formalism by showing that link streams also generalize time series. In particular, we show that a link stream corresponds to a time-series extended to a relational dimension, which opens the door to also extend the framework of signal processing to link streams. We therefore develop extensions of numerous signal concepts to link streams: from elementary ones like energy, correlation, and differentiation, to more advanced ones like Fourier transform and filters.
Frédéric Simard, Clémence Magnien, Matthieu Latapy
Journal of Graph Algorithms and Applications 27:3, 2023 DOI: 10.7155/jgaa.00620
Betweenness centrality is one of the most important concepts in graph analysis. It was recently extended to link streams, a graph generalization where links arrive over time. However, its computation raises non-trivial issues, due in particular to the fact that time is considered as continuous. We provide here the first algorithms to compute this generalized betweenness centrality, as well as several companion algorithms that have their own interest. They work in polynomial time and space, we illustrate them on typical examples, and we provide an implementation.
Code available here
Esteban Bautista, Matthieu Latapy
Fifth IEEE International Conference on Cognitive Machine Intelligence (CogMI), 2023
A link stream is a set of triplets (t,u,v) indicating that u and v interacted at time t. Link streams model numerous datasets and their proper study is crucial in many applications. In practice, raw link streams are often aggregated or transformed into time series or graphs where decisions are made. Yet, it remains unclear how the dynamical and structural information of a raw link stream carries into the transformed object. This work shows that it is possible to shed light into this question by studying link streams via algebraically linear graph and signal operators, for which we introduce a novel linear matrix framework for the analysis of link streams. We show that, due to their linearity, most methods in signal processing can be easily adopted by our framework to analyze the time/frequency information of link streams. However, the availability of linear graph methods to analyze relational/structural information is limited. We address this limitation by developing (i) a new basis for graphs that allow us to decompose them into structures at different resolution levels; and (ii) filters for graphs that allow us to change their structural information in a controlled manner. By plugging-in these developments and their time-domain counterpart into our framework, we are able to (i) obtain a new basis for link streams that allow us to represent them in a frequency-structure domain; and (ii) show that many interesting transformations to link streams, like the aggregation of interactions or their embedding into a euclidean space, can be seen as simple filters in our frequency-structure domain.
vendredi 17 novembre 2023 à 14h en salle 26-00/124, LIP6, Sorbonne Université
Microservices are small, independent and scalable services used to build applications, offering flexibility and high-quality service. However, this model presents challenges in terms of network congestion, microservice placement, resource management and energy consumption. Based on an analysis revealing a lack of research on energy optimisation, this thesis focuses on assessing the energy efficiency of microservice placement, using graph partitioning techniques to optimise microservice placement across network architecture layers (Cloud, Fog, Edge).
Alexis Baudin, Lionel Tabourier, Clémence Magnien
30th International Symposium on Temporal Representation and Reasoning (TIME 2023)
Community detection is a popular approach to understand the organization of interactions in static networks. For that purpose, the Clique Percolation Method (CPM), which involves the percolation of k-cliques, is a well-studied technique that offers several advantages. Besides, studying interactions that occur over time is useful in various contexts, which can be modeled by the link stream formalism. The Dynamic Clique Percolation Method (DCPM) has been proposed for extending CPM to temporal networks.
However, existing implementations are unable to handle massive datasets. We present a novel algorithm that adapts CPM to link streams, which has the advantage that it allows us to speed up the computation time with respect to the existing DCPM method. We evaluate it experimentally on real datasets and show that it scales to massive link streams. For example, it allows to obtain a complete set of communities in under twenty-five minutes for a dataset with thirty million links, what the state of the art fails to achieve even after a week of computation. We further show that our method provides communities similar to DCPM, but slightly more aggregated. We exhibit the relevance of the obtained communities in real world cases, and show that they provide information on the importance of vertices in the link streams.
mercredi 24 mai 2023 à 11h en salle 24-25/405, LIP6, Sorbonne Université
Is it always beneficial to create a new relationship (have a new follower/friend) in a social network? This question can be formally stated as a property of the centrality measure that defines the importance of the actors of the network. Score monotonicity means that adding an arc increases the centrality score of the target of the arc; rank monotonicity means that adding an arc improves the importance of the target of the arc relatively to the remaining nodes. It is known that most centralities are both score and rank monotone on directed, strongly connected graphs. In this paper, we study the problem of score and rank monotonicity for classical centrality measures in the case of undirected networks: in this case, we require that score, or relative importance, improve at both endpoints of the new edge. We show that, surprisingly, the situation in the undirected case is very different, and in particular that closeness, harmonic centrality, betweenness, eigenvector centrality, Seeley’s index, Katz’s index, and PageRank are not rank monotone; betweenness and PageRank are not even score monotone. In other words, while it is always a good thing to get a new follower, it is not always beneficial to get a new friend.
vendredi 12 mai 2023 à 11h en salle 26-00/228 LIP6, Sorbonne Université
Straightness is a measure designed to characterize a pair of vertices in a spatial graph. In practice, it is often averaged over the whole graph, or a part of it. The standard approach consists in: 1) discretizing the graph edges, 2) processing the vertex-to-vertex Straightness considering the additional vertices resulting from this discretization, and 3) averaging the obtained values. However, this discrete approximation can be computationally expensive on large graphs, and its precision has not been clearly assessed. In this work, we adopt a continuous approach to average the Straightness over the edges of spatial graphs. This allows us to derive 5 distinct measures able to characterize precisely the accessibility of the whole graph, as well as individual vertices and edges. Our method is generic and could be applied to other measures designed for spatial graphs. We perform an experimental evaluation of our continuous average Straightness measures, and show how they behave differently from the traditional vertex-to-vertex ones. Moreover, we also study their discrete approximations, and show that our approach is globally less demanding in terms of both processing time and memory usage.
mardi 9 mai 2023 à 11h en salle 26-00/428 LIP6, Sorbonne Université
Une transition des systèmes socio-techniques semble être nécessaire afin de limiter le dépassement des limites planétaires. L’ingénierie, activité influencée par le contexte historique et épistémologique de son époque, peut être un levier pour cette transition. Ma thèse a permis de proposer un cadre théorique pour comprendre comment l’ingénierie peut exister dans des contextes de soutenabilité forte et explorer des outils informatiques qui peuvent être déployés dans ces contextes. Ce cadre est composé de 4 caractéristiques (éthique, objectif, démarche, expertise) et adresse le niveau de l’ingénierie et le niveau des interactions entre l’ingénieur et ses outils informatiques. La méthodologie Value Sensitive Design a été mise en œuvre pour explorer ce cadre théorique. Deux expérimentations ont été conduites pour tester une première médiatisation de la permaingénierie dans un outil informatique – outil mettant en œuvre une démarche d’analyse environnementale collaborative. Ces expérimentations ont permis de montrer un besoin de cadre conceptuel pour une ingénierie en contexte de soutenabilité forte et un manque de pratique des acteurs exprimant ce besoin. Trois contributions ont été identifiées : (1) la formalisation du cadre de permaingénierie, (2) l’approche par les interactions humains-machines pour adresser les questions de transitions culturelles et changement de valeurs, (3) l’impossibilité de transformer un outil d’ingénierie de soutenabilité faible en outil de soutenabilité forte.
Main author and presenter: Leo Cazenille
Other authors: Nicolas Lobato-Dauzier, Alessia Loi, Mika Ito, Olivier Marchal, Nathanael Aubert-Kato, Nicolas Bredeche, Anthony J. Genot
April 18th, 2023, 11am
Biological swarms have showcased their extraordinary capabilities in tackling geometric challenges by employing limited perception and mobility. They achieve this feat by internally diffusing information to bridge the gap between local and global scales, ultimately facilitating collective consensus and decision-making, even when individual agents only have access to local information. In this study, we strive to adapt this paradigm to robotic swarms, which consist of small robots with constrained sensing and computational abilities.
Our bio-inspired approach leverages spectral shape analysis, enabling the robotic swarms to identify the shape of a given arena. By estimating the second eigenvalue of the Laplacian collectively through information exchange, the robots can effectively obtain a fingerprint of the arena’s geometry. This metric, known as algebraic connectivity, proves invaluable in the context of swarm-based problem-solving and coordination.
To evaluate the performance of our proposed method, we conducted experiments involving 25 real robots as well as simulations using Kilombo, a state-of-the-art simulator for kilobots. Our objective was to assess the efficacy of our approach by attempting to classify a set of 8 shapes with varying geometric properties. The results of these experiments and simulations indicate that the diffusion-based spectral analysis can indeed empower robotic swarms to accurately sense and classify the geometry of their environment.
In conclusion, our innovative approach offers a promising avenue for advancing swarm-based problem-solving and coordination by drawing inspiration from the remarkable capabilities of biological swarms in addressing geometric challenges with limited perception and mobility.
(Presentation in French with slides in English).
Alexis Baudin, Clémence Magnien, Lionel Tabourier
arXiv preprint arXiv:2302.00360
Link streams offer a good model for representing interactions over time. They consist of links (b,e,u,v), where u and v are vertices interacting during the whole time interval [b,e]. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair (C,[t0,t1]), where C is a set of vertices that all interact pairwise during the full interval [t0,t1]. It is maximal when neither its set of vertices nor its time interval can be increased. Some of the main works solving this problem are based on the famous Bron-Kerbosch algorithm for enumerating maximal cliques in graphs. We take this idea as a starting point to propose a new algorithm which matches the cliques of the instantaneous graphs formed by links existing at a given time t to the maximal cliques of the link stream. We prove its validity and compute its complexity, which is better than the state-of-the art ones in many cases of interest. We also study the output-sensitive complexity, which is close to the output size, thereby showing that our algorithm is efficient. To confirm this, we perform experiments on link streams used in the state of the art, and on massive link streams, up to 100 million links. In all cases our algorithm is faster, mostly by a factor of at least 10 and up to a factor of 10**4. Moreover, it scales to massive link streams for which the existing algorithms are not able to provide the solution.
Alexis Baudin, Clémence Magnien, Lionel Tabourier
EGC 2023, vol. RNTI-E-39, pp.139-150
Les flots de liens offrent un formalisme de description d’interactions au cours du temps. Un lien correspond à deux sommets qui interagissent sur un intervalle de temps. Une clique est un ensemble de sommets associé à un intervalle de temps durant lequel ils sont tous connectés. Elle est maximale si ni son ensemble de sommets ni son intervalle de temps ne peuvent être augmentés. Les algorithmes existants pour énumérer ces structures ne permettent pas de traiter des jeux de données réels de plus de quelques centaines de milliers d’interactions. Or, l’accès à des données toujours plus massives demande d’adapter les outils à de plus grandes échelles. Nous proposons alors un algorithme qui énumère les cliques maximales sur des réseaux temporels réels et massifs atteignant jusqu’à plus de 100 millions de liens. Nous montrons expérimentalement qu’il améliore l’état de l’art de plusieurs ordres de grandeur.
mercredi 25 janvier 2023, 26-00/332 LIP6, Sorbonne Université
A 2-hop labeling (a.k.a. hub labeling) consists in assigning to each node of a graph a small subset of nodes called hubs so that any pair of nodes have a common hub lying on a shortest path joining them. Such labelings appeared to provide a very efficient representation of distances in practical road network where surprisingly small hub sets can be computed. A graph parameter called skeleton dimension allows to explain this behaviour. Connecting any two nodes through a common hub can be seen as a 2-hop shortest path in a super-graph of the original graph. A natural extension is to consider more hops and is related to the notion of hopsets introduced in parallel computation of shortest paths. Surprisingly, a 3-hop construction leads to a data-structure for representing distances which is asymptotically both smaller and faster than 2-hop labeling in graphs of bounded skeleton dimension. Another natural question is to ask whether 2-hop labelings can be efficient more generally in sparse graphs. Unfortunately, this is not the case as there exists bounded degree graphs that require quasi-linear average hub set size. The construction of such difficult graphs is related to the construction of dense graphs with n nodes that can be decomposed into n induced matchings as introduced by Ruzsa and Szemerédi in the seventies.
Fabrice Lécuyer, Louis Jachiet, Clémence Magnien, Lionel Tabourier
Listing triangles is a fundamental graph problem with many applications, and large graphs require fast algorithms. Vertex ordering allows the orientation of edges from lower to higher vertex indices, and state-of-the-art triangle listing algorithms use this to accelerate their execution and to bound their time complexity. Yet, only basic orderings have been tested. In this paper, we show that studying the precise cost of algorithms instead of their bounded complexity leads to faster solutions. We introduce cost functions that link ordering properties with the running time of a given algorithm. We prove that their minimization is NP-hard and propose heuristics to obtain new orderings with different trade-offs between cost reduction and ordering time. Using datasets with up to two billion edges, we show that our heuristics accelerate the listing of triangles by an average of 38% when the ordering is already given as an input, and 16% when the ordering time is included.
Antoine Genitrini, Mehdi Naima, Olivier Bodini
The 15th Latin American Theoretical Informatics Symposium (LATIN 2022)
In this paper we study a model of Schr ̈oder trees whose labelling is increasing along the branches. Such tree family is useful in the context of phylogenetic. The tree nodes are of arbitrary arity (i.e. out-degree) and the node labels can be repeated throughout different branches of the tree. Once a formal construction of the trees is formalized, we then turn to the enumeration of the trees inspired by a renormalisation due to Stanley on acyclic orientations of graphs. We thus exhibit links between our tree model and labelled graphs and prove a one-to-one correspondence between a subclass of our trees and labelled graphs. As a by-product we obtain a new natural combinatorial interpretation of Stanley’s renormalising factor. We then study different combinatorial characteristics of our tree model and finally, we design an efficient uniform random sampler for our tree model which allows to generate uniformly Erdös-Renyi graph with a constant number of rejections on
Maximilien Danisch, Ioannis Panagiotas, Lionel Tabourier
In order to manage massive graphs in practice, it is often necessary to resort to graph compression, which aims at reducing the memory used when storing and processing the graph. Efficient compression methods have been proposed in the literature, especially for web graphs. In most cases, they are combined with a vertex reordering pre-processing step which significantly improves the compression rate. However, these techniques are not as efficient when considering other kinds of graphs. In this paper, we focus on the class of bipartite graphs and adapt the vertex reordering phase to their specific structure by proposing a dual reordering scheme. By reordering each group of vertices in the purpose of minimizing a specific score, we show that we can reach better compression rates. We also suggest that this approach can be further refined to make the node orderings more adapted to the compression phase that follows the ordering phase.
Nouamane Arhachoui, Esteban Bautista, Maximilien Danisch, Anastasios Giovanidis
Abstract—Measuring the influence of users in social networks is key for numerous applications. A recently proposed influence metric, coined as $\psi$-score, allows to go beyond traditional centrality metrics, which only assess structural graph importance, by further incorporating the rich information provided by the posting and re-posting activity of users. The $\psi$-score is shown in fact to generalize PageRank for non-homogeneous node activity. Despite its significance, it scales poorly to large datasets; for a network of N users it requires to solve N linear systems of equations of size N. To address this problem, this work introduces a novel scalable algorithm for the fast approximation of $\psi$-score, named Power-$\psi$. The proposed algorithm is based on a novel equation indicating that it suffices to solve one system of equations of size N to compute the $\psi$-score. Then, our algorithm exploits the fact that such system can be recursively and distributedly approximated to any desired error. This permits the $\psi$-score, summarizing both structural and behavioral information for the nodes, to run as fast as PageRank. We validate the effectiveness of the proposed algorithm on several real-world datasets.
June 27th, 2022, 11am
Room : 24-25/405
Data streams pose several challenges for learning algorithms, including mainly, but not limited to, restricted resources (in terms of memory and processing time), high-dimensionality, and concept drift constraints. To process these evolving data, we need efficient and accurate techniques and strategies, such as window models, summarization techniques, and other manners to restrict the storage to a part of — and/or synopsis information from — the stream instead of maintaining it in its entirety. This talk will present how such challenges can be addressed and how we can reduce machine learning algorithms’ resource costs while maintaining good accuracy.
Esteban Bautista, Matthieu Latapy
In Social Network Analysis and Mining, 2022, vol. 12, no 1, p. 1-11.
The personalized PageRank algorithm is one of the most versatile tools for the analysis of networks. In spite of its ubiquity, maintaining personalized PageRank vectors when the underlying network constantly evolves is still a challenging task. To address this limitation, this work proposes a novel distributed algorithm to locally update personalized PageRank vectors when the graph topology changes. The proposed algorithm is based on the use of Chebyshev polynomials and a novel update equation that encompasses a large family of PageRank-based methods. In particular, the algorithm has the following advantages: (i) it has faster convergence speed than state-of-the-art alternatives for local PageRank updating; and (ii) it can update the solution of recent generalizations of PageRank for which no updating algorithms have been developed. Experiments in a real-world temporal network of an autonomous system validate the effectiveness of the proposed algorithm.
April 4th, 2022, 11am
Room : 24-25/405
While there is a great deal of work on designing centrality measures, the mainstream does not exploit the network’s community structure. Nevertheless, communities are pervasive in many real-world networks. A community is generally apprehended as a group of nodes densely connected between each other and sparsely connected with other nodes. As communities play a significant role in understanding how nodes behave in networks, a research area concerned with the relation between community structure and the importance of nodes has recently emerged in network science. These works have shown that incorporating community structure information allows designing more effective centrality measures. We refer to them as “community-aware” centrality measures. In this talk, we shed light on how classical (i.e., community-agnostic) centrality measures relate to community-aware centrality measures given a network’s macroscopic and mesoscopic topology. Then, we show the subtility of using these measures in different dynamic models, namely the Susceptible-Infected-Recovered (SIR) model and the Linear Threshold (LT) model. Additionally, as there are plenty of works to detect overlapping communities, few scientists make use of the overlapping community structure to identify critical nodes. Indeed, nodes may belong to several communities in many situations, indicating an overlapping community structure. We propose a framework to target influential nodes in networks with an overlapping community structure inspired by the concept of vitality. Finally, ascribable to the significance of communities in real-world networks, we present a backbone extraction method that maintains the network’s modularity while essentially reducing its original size.