+DS vLE: An Introduction to Text Analysis


Chris Bail

This seminar will provide an introduction to text analysis. Text-based data abounds on social media platforms, digital archives, and elsewhere, but it poses numerous challenges for modeling because it is highly unstructured. We will discuss basic concepts in text analysis (e.g. tokenization, n-grams, and creating a document-term matrix). The class will briefly introduce dictionary-based methods of text analysis and conclude by preparing students for more advanced topics in text analysis such as Latent Dirichlet Allocation and Word2Vec. This session is part of the Duke+DataScience (+DS) program virtual learning experiences (vLEs). To learn more, please visit