Skip to content

+DS vLE: An Introduction to Topic Modeling in R

Speaker

Chris Bail

This seminar will discuss one of the most popular techniques for automatically identifying latent or hidden themes in text: topic models. We will begin with a brief, high-level introduction to Latent Dirichlet Allocation but spent most of the hour discussing how to write code to perform topic modeling on a corpus of political statements. This webinar will cover both conventional LDA as well as Structural Topic Modeling- a more recent technique that employs meta-data to improve classification of documents according to latent themes or topics. This course assumes a basic working knowledge of R, and the content covered in an earlier "Introduction to Text Analysis" webinar that covers text preprocessing and creating document-term matrices. This session is part of the Duke+DataScience (+DS) program virtual learning experiences (vLEs). To learn more, please visit https://plus.datascience.duke.edu