Adrien Guille — Index - Recherche - Enseignement - Publications - Code

TOM: Topic modeling and browsing

About

TOM (TOpic Modeling) is a Python library for topic modeling and browsing. Its objective is to allow for an efficient analysis of a text corpus from start to finish, via the discovery of latent topics. To this end, TOM features advanced functions for preparing and vectorizing a text corpus. It also offers a unified interface for two topic models (namely LDA using either variational inference or Gibbs sampling, and NMF using alternating least-square with a projected gradient method), and implements three state-of-the-art methods for estimating the optimal number of topics to model a corpus. What is more, TOM constructs an interactive Web-based browser that makes exploring a topic model and the related corpus easy.

Publication

Adrien Guille, Pavel Soriano (2016) TOM: A library for topic modeling and browsing Actes de la Conférence Francophone sur l'Extraction et la Gestion des Connaissances (EGC), pp. 451-456

Code

TOM is distributed via GitHub under the terms of the MIT licence.

Demo

Check out the EGC anthology browser that was automatically generated with TOM.