Projects - Supervision - Talks - PhD


#néo: Learning temporal vector representation of words from the Web
MABED: Detecting significant events in social media
CATS: An online tool for collecting and analyzing large scale corpora of tweets
TOM: A library for topic modeling and browsing
T-BASIC: Modeling information diffusion in social media
SONDY: An open source social media data mining software (event detection + influence analysis)


Geoffrey Guettier (Master - INSA Lyon, 2017): Learning vector representations of words
Robin Brochier (PhD - Université Lyon 2, 2016 - 2019): Recommender systems for research papers (with Julien Velcin and
Anthony Deseille (Master - Université Lyon 1, 2016): Part-of-speech tagging for Twitter

Talks & seminars

Modéliser le sens des mots, 6ème semaine des mathématiques : Mathématiques et langage, 2017
Collecter et analyser des tweets avec CATS, Séminaire du Centre de Recherche en Traduction et Terminologie (CRTT), 2016
Modéliser le langage, Exposé à la journée CEStat, 2016
Apprentissage de représentations vectorielles des mots, Exposé scientifique de l'équipe Data Mining & Decision du laboratoire ERIC, 2016
Détection d'évènements et identification d'utilisateurs influents dans les médias sociaux, Séminaire de l'équipe R&D Viseo, 2016
Collection and Analysis of Tweets made Simple with CATS, 1/2 Journée thématique : Clustering de données textuelles dynamiques, 2016
Détection d'évènements dans les médias sociaux : la méthode MABED, Séminaire du Laboratoire d'Informatique d'Avignon (LIA), 2015
Détection d'évènements dans les médias sociaux : la méthode MABED, Séminaire de l'équipe ADVANSE du Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), 2015
Détection d'évènements dans les médias sociaux avec le logiciel SONDY, Séminaire de l'Institut des Systèmes Complexes de Paris Île-de-France (ISC-PIF), 2014
Diffusion de l'information dans les médias sociaux, Séminaire de l'Équipe de Recherche en Ingénierie des Connaissances (ERIC), 2012



Social media have greatly modified the way we produce, diffuse and consume information, and have become powerful information vectors. The goal of this thesis is to help in the understanding of the information diffusion phenomenon in social media by providing means of modeling and analysis.

First, we propose MABED (Mention-Anomaly-Based Event Detection), a statistical method for automatically detecting events that most interest social media users from the stream of messages they publish. In contrast with existing methods, it doesn't only focus on the textual content of messages but also leverages the frequency of social interactions that occur between users. Secondly, we propose T-BASIC (Time-Based ASynchronous Independent Cascades), a probabilistic model based on the network structure underlying social media for predicting information diffusion, more specifically the evolution of the number of users that relay a given piece of information through time. In contrast with similar models that are also based on the network structure, the probability that a piece of information propagate from one user to another isn't fixed but depends on time. We also describe a procedure for inferring the latent parameters of that model, which we formulate as functions of observable characteristics of social media users. Thirdly, we propose SONDY (SOcial Network DYnamics), a free and extensible software that implements state-of-the-art methods for mining data generated by social media, i.e. the messages published by users and the structure of the social network that interconnects them. As opposed to existing academic tools that either focus on analyzing messages or analyzing the network, SONDY permits the joint analysis of these two types of data through the analysis of influence with respect to each detected event.

The experiments, conducted on data collected on Twitter, demonstrate the relevance of our proposals and shed light on some properties that give us a better understanding of the mechanisms underlying information diffusion. First, we compare the performance of MABED against those of methods from the literature and find that taking into account the frequency of social interactions between users leads to more accurate event detection and improved robustness in presence of noisy content. We also show that MABED helps with the interpretation of detected events by providing clearer textual description and more precise temporal descriptions. Secondly, we demonstrate the relevancy of the procedure we propose for estimating the pairwise diffusion probabilities on which T-BASIC relies. For that, we illustrate the predictive power of users' characteristics, and compare the performance of the method we propose to estimate the diffusion probabilities against those of state-of-the-art methods. We show the importance of having non-constant diffusion probabilities, which allows incorporating the variation of users' level of receptivity through time into T-BASIC. We also study how - and in which proportion - the social, topical and temporal characteristics of users impact information diffusion. Thirdly, we illustrate with various scenarios the usefulness of SONDY, both for non-experts -- thanks to its advanced user interface and adapted visualizations -- and for researchers -- thanks to its application programming interface.


I share my PhD dissertation titled "Diffusion de l'information dans les médias sociaux : modélisation et analyse" (licensed under a Creative Commons Attribution 4.0 International license) : Download it! Or take a look at the slides!

Creative Commons License
Diffusion de l'information dans les médias sociaux : modélisation et analyse by Adrien Guille is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.