TOM: Topic modeling and browsing

Sections
About - Publication - Code - Demo

About

TOM (TOpic Modeling) is a Python library for topic modeling and browsing. Its objective is to allow for an efficient analysis of a text corpus from start to finish, via the discovery of latent topics. To this end, TOM features advanced functions for preparing and vectorizing a text corpus. It also offers a unified interface for two topic models (namely LDA using either variational inference or Gibbs sampling, and NMF using alternating least-square with a projected gradient method), and implements three state-of-the-art methods for estimating the optimal number of topics to model a corpus. What is more, TOM constructs an interactive Web-based browser that makes exploring a topic model and the related corpus easy.

Publication

Adrien Guille, Pavel Soriano (2016) TOM: A library for topic modeling and browsing
Actes de la conférence française sur l'Extraction et la Gestion des Connaissances (EGC), pp. 451-456

Code

TOM is distributed via GitHub under the terms of the MIT licence.

Demo

Check out the EGC anthology browser that was automatically generated with TOM.

Topic cloud

Topic details

Document details