Understanding information diffusion on complex networks is a key issue from a theoretical and applied perspective. Epidemiology-inspired SIR models have been proposed to model information diffusion. Recent papers have analyzed this question from a data-driven perspective, using on-line diffusion data. We follow this approach, investigating if epidemic models, calibrated with a systematic procedure, are capable of reproducing key structural properties of spreading cascades.
We first identified a large-scale, rich dataset from which we can reconstruct the diffusion trail and the underlying network. Secondly, we examine the simple SIR model as a baseline model and conclude that it was unable to generate structurally realistic spreading cascades. We extend this result examining model extensions which take into account heterogeneities observed in the data. In contrast, similar models which take into account temporal patterns (which can be estimated with the interaction data) generate more similar cascades qualitatively. Although one key property was not reproduced in any model, this result highlights the importance of temporal patterns to model diffusion phenomena.
We have also analyzed the impact of the underlying network topology on synthetic spreading cascade structure. We have simulated spreading cascades in similar conditions as the real cascades observed in our dataset, namely, with the same time constraints and with the same "seeds". Using a sequence of uniformly random graphs derived from the real graph and with increasing structure complexity, we have examined the impact of key topological properties for the models presented previously. We show that in our setting, the distribution of the number of neighbors of seed nodes is the most impacting property among the investigated ones.