Welcome to the 2019 DMD Workshop

Organized by :

   

Sponsored by :

   

For several years machine learning algorithms heavily relied on hand-crafted features, which made those classic approaches highly sensitive to data representation oriented by human expertise and specific applications. Recent advances in machine learning suggested to automatically learn the best possible representation straight from the data. This representation can then be leveraged to solve downstreams problems, such as object recognition, information retrieval, automatic translation. Representation learning (RL) has been especially fostered by the success of deep learning on various kind of data: images, videos, texts, graphs, etc. However its scope is clearly beyond artificial network architectures and it can be addressed by other techniques such as matrix factorization.

This workshop aims at gathering researchers from the different fields interested in the development of RL, such as machine learning, information retrieval, natural language processing, computer vision, data mining. We target researchers from both industry and academia to join forces in this exciting area. We intend to discuss the recent and significant developments in RL and to promote cross-fertilization of techniques. This event is at the initiative of the Data Mining & Decision team of the ERIC Lab, Université de Lyon.

The Workshop is a one-day event with 6 keynote speakers and a poster session during the lunch break. The poster session is intended to foster a discussion between young researchers who begin working with such techniques and experts in the field. The lunch is offered to the attendees who are registered. Registration is free but mandatory.

Schedule

________________________

Workshop opening

9:00 - 9:30

Julien Velcin

Laboratoire ERIC, Université Lumière Lyon 2



____________________

Morning session

9:30 - 12:00

Benjamin Roth

Center for Information and Language Processing, Munich University

9:30 - 10:30

Representation learning for NLP: an interface for inference across modalities

Abstract
Slides

Navid Rekabsaz

Natural Language Understanding Laboratory, Idiap Research Institute

10:30 - 11:30

[Canceled] Representation learning for information retrieval: term matching, term saliency, and bias

Abstract

Adrien Guille

Laboratoire ERIC, Université Lumière Lyon 2

11:30 - 12:00

Recent advances in attributed network embedding

Abstract
Slides


______________________

Lunch break and Poster session

12:00 - 13:30

List of accepted posters
______________________

Afternoon session

13:30 - 16:30

Ludovic Denoyer

Facebook Artificial Intelligence Research & Laboratoire d'Informatique de Paris 6, Sorbonne Université

13:30 - 14:30

Learning disentangled representation with weak supervision

Abstract
Slides

Christophe Gravier

Laboratoire Hubert Curien, Université Jean Monnet

14:30 - 15:30

Neural networks for NLP: can structured knowledge help?

Abstract
Slides

Eloi Zablocki

Laboratoire d'Informatique de Paris 6, Sorbonne Université

15:30 - 16:30

Grounded Representation Learning for NLP and Applications to Zero-Shot Learning

Abstract
Slides

Detailed program

Benjamin Roth

Center for Information and Language Processing, Munich University

9:30 - 10:30

Representation learning for NLP: an interface for inference across modalities

Many of the success stories in natural language processing (NLP) rely on good learned feature representations. In my talk I will give an overview of the recent history of representation learning, and illustrate common and interesting approaches with prototypical use cases. Early approaches modeled words as the main information bearing units, with representations estimated from corpus co-occurrences. Later approaches captured subword-level information, from characters or automatically segmented units. The latest wave of improvements for representation in NLP uses larger text spans to represent words not as atomic units, but to find contextualized word representations, i.e., to provide representations of words in the context that they are in. These modern representation-learning algorithms are resource- and data-hungry, and current state-of-the-art algorithms for contextualized word representations have perfected the art of training good representations from large amounts of unlabeled data. Natural language information is usually not isolated from information in other modalities. In the second part of my talk I will draw the connection to representation learning with structured data, and how joint embeddings of language and structured knowledge can facilitate reasoning across domains. Current research on interpretable reasoning in text and knowledge graphs raises the question how inferences can be drawn in a transparent way across modalities.

Navid Rekabsaz

Natural Language Understanding Laboratory, Idiap Research Institute

10:30 - 11:30

Representation learning for information retrieval: term matching, term saliency, and bias

Representation Learning (RL) suggest a computational model to capture the subtleties of language, by providing vectors as proxies to the semantics of linguistics entities. RL has become the cornerstone in several language and text processing areas, in particular Information Retrieval (IR). In this talk, I first review the principles of "classical" term-matching-based IR models, followed by an introduction of the Generalized/Extended Translation Models - two recent methods to exploit the semantic similarities in classical IR models. Supported by recent releases of several large-scale IR collections, I explain in detail the novel advances of document retrieval models with RL, focusing on soft term-matching, as well as term saliency aspects. Finally, I briefly discuss the interpretability aspect of word2vec Skip-Gram — a neural word representation model — followed by a presentation of two applications of the introduced methods to financial sentiment analysis and gender bias quantification.

Ludovic Denoyer

Facebook Artificial Intelligence Research & Laboratoire d'Informatique de Paris 6, Sorbonne Université

13:30 - 14:30

Learning disentangled representation with weak supervision

In this talk, I will focus on the problem of controlling factor of variations through learning disentangled representations. Particularly, I will show that controlling these factors of variations can be learned using only weakly supervised datasets where these factors are not explicit and thus cannot be used in a classical supervised learning setting. I will describe two approaches: a first one based on the use of adversarial losses, and a second one based on back-translation techniques, and I will present different variants on both image processing and text rewritting problems.

Christophe Gravier

Laboratoire Hubert Curien, Université Jean Monnet

14:30 - 15:30

Neural networks for NLP: can structured knowledge help?

Many of Natural Language Processing modern pipelines share a two-step process approach. Firstly, a neural network - historically shallow - is trained to learn word representations on a large corpora. The two major trends are to learn context-free or contextual word embeddings, and in both cases, they gained a major attention in the community since they are learnt in an unsupervised setting while outperforming any other alternatives. The second stage of the pipeline is to feed another (usually deep) neural network dedicated to a downstream NLP task, such as document classification, machine comprehension, generative models – to name a few. Such networks benefit from the advances in deep learning in general, and also bring new paradigms in the frame of NLP (e.g. the attention mechanism and its derivatives). My talk will emphasize how neural models in NLP can benefit from external structured knowledge, and I will especially look at some of our recent works in these areas. In the case of word representation, this includes a very simple approach to learn context-free word embeddings that outperforms popular frameworks such as Glove, FastText and Word2Vec, including for the rare words use case. As for downstream networks, we will discuss how to leverage knowledge bases for Natural Language Generation models, especially for zero-shot question generation and learning to generate textual summaries tasks.

Eloi Zablocki

Laboratoire d'Informatique de Paris 6, Sorbonne Université

15:30 - 16:30

Grounded Representation Learning for NLP and Applications to Zero-Shot Learning

Representing language semantics is a long-standing problem for the natural language processing community, and to further improve traditional approaches towards that goal, leveraging visual information is crucial. This talk describes our efforts in this direction and we present a method to use images, along with textual data, to learn efficient word representation by considering the visual context of objects and their spatial organization in scenes. Conversely, as language contains high-level knowledge, linguistic representations can be used to assist and augment capacities of computer vision recognition systems. In the second part of this talk, we will thus focus on the zero-shot learning recognition task which consists in recognizing objects that have never been seen, thanks to linguistic knowledge acquired about the objects beforehand. We present a model for zero-shot recognition that leverages (1) the region of interest, (2) the semantic representations of objects, and (3) the visual context of an object.


Call for posters

The main objective of this workshop is to provide an overview of recent research that can be led in various fields (NLP, information retrieval, machine learning, computer vision...) but relates to representation learning. The program includes keynote presentations from invited speakers and a poster session. The poster session is intended to foster a discussion between young researchers who begin working with such techniques and experts in the field.

Please note that we do not plan to publish proceedings for this workshop.

Topics of interest

  • unsupervised, semi-supervised, and supervised representation learning
  • metric/kernel learning
  • different kinds of embeddings (word, sentence, document, graph...)
  • visualization/interpretation of learned representations
  • applications in various fields (vision, audio, speech, NLP...)
  • demonstration

Program Committee

  • Alexandre Allauzen : LIMSI, Université Paris-Sud
  • Isabelle Bloch : LTCI, Telecom ParisTech
  • Rémi Cazabet : LIRIS, Université Claude Bernard, Lyon
  • Vincent Claveau : IRISA, CNRS
  • Nicolas Dugué : LIUM, Le Mans Université
  • Adrien Guille : ERIC, Université Lumière Lyon 2
  • Amaury Habrard : LabHC, Université Jean Monnet, Saint-Etienne
  • Julien Jacques : ERIC, Université Lumière Lyon 2
  • Christine Largeron : LabHC, Université Jean Monnet, Saint-Etienne
  • Benjamin Piwowarski : LIP6, Sorbonne Université, Paris, CNRS
  • Julien Velcin : ERIC, Université Lumière Lyon 2

Important Dates

  • Submission Deadline: April 1, 2019
  • Notification Date: April 15, 2019
  • Workshop Date: May 24, 2019

List of accepted posters

  • Alloy Design and Optimization (Mariam Assi, Mines St-Etienne et Université de Nantes, LGF)
  • Utilisation d'embeddings de documents pour la détection de nouveauté (Clément Christophe, Université Lumière Lyon 2, ERIC, avec EDF R&D)
  • Link Prediction with Mutual Attention for Text-Attributed Networks (Robin Brochier, Université Lumière Lyon 2, ERIC, avec DSRT)
  • Document embedding with pretrained word embeddings (Antoine Gourru, Université Lumière Lyon 2, ERIC)
  • Autoencoding any Data through Kernel Autoencoders (Pierre Laforgue, Telecom ParisTech, LTCI)
  • Using Text and Image for Topic Detection on Twitter (Béatrice Mazoyer, CentraleSupélec, SciencesPo Paris, avec l'INA)
  • Prévision de la consommation électrique à l'aide de données textuelles (David Obst, Université d'Aix-Marseille, I2M, avec EDF R&D)

Location

IUT Lumière Lyon 2

Grand amphithéâtre - Bâtiment 1

160 Boulevard de l'Université, 69500 Bron

Organizers

This workshop is organized by the Data Mining & Decision team of the ERIC laboratory.

Julien Velcin - Laboratoire ERIC, Université Lumière Lyon 2, Université de Lyon
Adrien Guille - Laboratoire ERIC, Université Lumière Lyon 2, Université de Lyon