Link Search Menu Expand Document

Topic Modeling with Latent Semantic Analysis (LSA)**

Overview

In this lab, you will use the BBC News dataset to perform Topic Modeling using Latent Semantic Analysis (LSA). The BBC dataset contains articles from three news categories (topics), and you will apply LSA to uncover hidden topics within these articles. You’ll preprocess the dataset, apply SVD, and interpret the results by labeling the discovered topics.


Goals

By the end of this lab, you will:

  1. Understand how to preprocess text documents for LSA.
  2. Apply SVD to a term-document matrix to identify latent topics.
  3. Label and interpret the topics based on the most significant terms.

Instructions

  • Download the dataset as a csv.
  • Download the Python notebook template file.
  • Use either Jupyter Notebook or Google Colab to complete the lab (keep in mind, Google Colab may involve some additional configuration to import the data).
  • Show the lab instructor your completed work before the lab session ends.