Link Search Menu Expand Document

Course Schedule

Introduction to Data Science

Topic 1: Introduction to Data Science

Overview

This topic introduces the fundamental concepts of data science and sets up the tools we’ll use throughout the semester.

Learning Objectives

  • Understand the data science workflow and methodology
  • Learn about key tools and technologies in the field
  • Set up your development environment
  • Explore real-world data science applications

Lectures

Lecture 1: Course Introduction & Data Science Workflow

  • Course overview and expectations
  • Data Science Process: CRISP-DM methodology
  • Problem definition → Data collection → Data cleaning → Analysis → Modeling → Evaluation → Deployment
  • Real-world data science applications

Lecture 2: Tools & Environment Setup

  • Python ecosystem introduction: pandas, numpy, matplotlib, scikit-learn
  • Jupyter Notebooks: Interactive development environment
  • Environment setup and configuration
  • Basic pandas and matplotlib exercises

Key Concepts

  • Data Science Process: Systematic approach to solving data problems
  • CRISP-DM: Cross-Industry Standard Process for Data Mining
  • Python Ecosystem: Core libraries for data science
  • Jupyter Notebooks: Interactive development environment

Required Reading

  • Course syllabus and policies
  • Python basics review (if needed)
  • Introduction to pandas documentation

Assignments

  • Setup: Install Python, Jupyter, and required packages
  • Tutorial: Complete basic pandas and matplotlib exercises
  • Discussion: Share data science project ideas

Resources

Next Topic

We’ll dive into clustering algorithms and begin working with real datasets.

Clustering

Topic 2: Clustering

Overview

This topic covers fundamental clustering algorithms and techniques for grouping similar data points together.

Learning Objectives

  • Understand distance and similarity measures
  • Learn K-means and K-means++ algorithms
  • Explore hierarchical and density-based clustering
  • Apply clustering to real-world datasets

Lectures

Lecture 3: Distance & Similarity

  • Euclidean, Manhattan, and cosine distance
  • Similarity measures and their applications
  • Choosing appropriate distance metrics
  • Lecture Slides - Worksheet

Lecture 4: K-means Clustering

  • K-means algorithm and convergence
  • Choosing the number of clusters (k)
  • Initialization strategies
  • Lecture Slides - Worksheet

Lecture 5: K-means++ & Advanced Initialization

Lecture 6: Hierarchical Clustering

  • Agglomerative and divisive approaches
  • Linkage methods (single, complete, average)
  • Dendrogram interpretation
  • Lecture Slides - Worksheet

Lecture 7: Density-Based Clustering

Lecture 8: Soft Clustering & Aggregation

Key Concepts

  • Distance Metrics: Euclidean, Manhattan, cosine distance
  • K-means: Iterative centroid-based clustering
  • Hierarchical Clustering: Tree-based clustering approach
  • DBSCAN: Density-based clustering for arbitrary shapes
  • Ensemble Methods: Combining multiple clustering results

Assignments

  • Project Proposal Due: Submit your final project proposal
  • Clustering Exercises: Complete worksheets for each algorithm
  • Real-world Application: Apply clustering to a dataset of your choice

Important Dates

  • Project Proposal: Due during this topic

Next Topic

We’ll explore Singular Value Decomposition (SVD) for dimensionality reduction and feature extraction.

Singular Value Decomposition

Topic 3: Singular Value Decomposition

Overview

This topic covers Singular Value Decomposition (SVD), a powerful technique for dimensionality reduction, feature extraction, and data compression.

Learning Objectives

  • Understand the mathematical foundations of SVD
  • Apply SVD for dimensionality reduction
  • Use SVD for feature extraction and data compression
  • Implement SVD in real-world applications

Lectures

Lecture 9: Singular Value Decomposition Fundamentals

  • Mathematical foundations of SVD
  • Matrix decomposition: A = UΣV^T
  • Understanding singular values and their importance
  • Lecture Slides - Worksheet

Lecture 10: SVD Applications & Implementation

  • Dimensionality reduction with SVD
  • Feature extraction and compression
  • Principal Component Analysis (PCA) connection
  • Worksheet

Key Concepts

  • SVD Decomposition: A = UΣV^T where U and V are orthogonal, Σ is diagonal
  • Singular Values: Measure of importance of each component
  • Dimensionality Reduction: Reducing features while preserving information
  • Feature Extraction: Finding meaningful patterns in data
  • PCA Connection: SVD is the foundation of Principal Component Analysis

Applications

  • Image Compression: Reducing image dimensions while preserving quality
  • Text Analysis: Latent Semantic Analysis (LSA) for document similarity
  • Recommendation Systems: Matrix factorization for collaborative filtering
  • Noise Reduction: Removing noise from data using low-rank approximation

Assignments

  • SVD Implementation: Complete SVD exercises and applications
  • Dimensionality Reduction: Apply SVD to reduce dataset dimensions
  • Real-world Application: Use SVD for image compression or text analysis

Next Topic

We’ll begin exploring classification algorithms, starting with K-Nearest Neighbors and decision trees.

Classification

Topic 4: Classification

Overview

This topic covers fundamental classification algorithms and techniques for predicting categorical outcomes from data.

Learning Objectives

  • Understand K-Nearest Neighbors and decision trees
  • Learn Naive Bayes and Support Vector Machines
  • Master model evaluation and ensemble methods
  • Apply classification to real-world problems

Lectures

Lecture 11: Introduction to Classification & K-Nearest Neighbors

  • Classification problem formulation
  • K-Nearest Neighbors algorithm
  • Distance metrics for classification
  • Model evaluation basics
  • Lecture Slides - Worksheet

Lecture 12: Decision Trees

  • Decision tree construction
  • Information gain and entropy
  • Tree pruning and overfitting
  • Interpretable machine learning
  • Lecture Slides - Worksheet

Lecture 13: Naive Bayes & Model Evaluation

Lecture 14: Support Vector Machines

  • Linear and non-linear SVM
  • Kernel functions and feature spaces
  • Margin maximization
  • SVM for classification and regression
  • Lecture Slides - Worksheet

Lecture 15: Recommender Systems & Midterm Launch

  • Collaborative filtering
  • Content-based recommendation
  • Matrix factorization approaches
  • Midterm 2 competition launch
  • Lecture Slides

Key Concepts

  • Classification: Predicting categorical outcomes
  • K-Nearest Neighbors: Instance-based learning
  • Decision Trees: Rule-based classification
  • Naive Bayes: Probabilistic classification
  • Support Vector Machines: Margin-based classification
  • Model Evaluation: Accuracy, precision, recall, F1-score

Important Dates

  • Midterm Report Due: March 31
  • Midterm 2 Launch: April 2

Assignments

  • Classification Exercises: Complete worksheets for each algorithm
  • Model Comparison: Compare different classification algorithms
  • Midterm Report: Submit your midterm project report

Next Topic

We’ll explore regression algorithms, starting with linear regression and model evaluation.

Regression

Topic 5: Regression

Overview

This topic covers regression algorithms for predicting continuous and categorical outcomes, including linear and logistic regression.

Learning Objectives

  • Understand linear regression fundamentals
  • Learn logistic regression for classification
  • Master regression model evaluation
  • Apply regression to real-world problems

Lectures

Lecture 16: Linear Regression

  • Linear regression fundamentals
  • Ordinary Least Squares (OLS)
  • Assumptions and diagnostics
  • Feature engineering for regression
  • Lecture Slides - Worksheet

Lecture 17: Linear Model Evaluation

  • Model evaluation metrics (R², RMSE, MAE)
  • Residual analysis and diagnostics
  • Cross-validation for regression
  • Feature importance and interpretation
  • Lecture Slides - Worksheet

Lecture 18: Advanced Linear Model Evaluation

  • Multicollinearity and feature selection
  • Regularization (Ridge, Lasso)
  • Model comparison and selection
  • Real-world regression applications
  • Lecture Slides - Worksheet

Lecture 19: Logistic Regression

  • Logistic regression fundamentals
  • Binary and multiclass classification
  • Odds ratios and interpretation
  • Model evaluation for classification
  • Lecture Slides - Worksheet

Lecture 20: Logistic Regression Continued

  • Advanced logistic regression topics
  • Feature engineering for classification
  • Model interpretation and explainability
  • Real-world applications
  • Lecture Slides - Worksheet

Key Concepts

  • Linear Regression: Predicting continuous outcomes
  • Logistic Regression: Predicting categorical outcomes
  • Model Evaluation: R², RMSE, MAE for regression
  • Regularization: Ridge and Lasso regression
  • Feature Engineering: Creating meaningful features
  • Model Interpretation: Understanding model predictions

Applications

  • House Price Prediction: Using features to predict real estate prices
  • Medical Diagnosis: Predicting disease presence from symptoms
  • Marketing: Predicting customer behavior and preferences
  • Finance: Predicting stock prices and market trends

Assignments

  • Regression Exercises: Complete worksheets for linear and logistic regression
  • Model Comparison: Compare different regression approaches
  • Real-world Application: Apply regression to a dataset of your choice

Next Topic

We’ll explore neural networks and deep learning fundamentals.

Neural Networks

Topic 6: Neural Networks

Overview

This topic covers neural networks and deep learning fundamentals, including the backpropagation algorithm and modern neural network architectures.

Learning Objectives

  • Understand neural network fundamentals
  • Learn the backpropagation algorithm
  • Explore modern neural network architectures
  • Apply neural networks to real-world problems

Lectures

Lecture 21: Fundamentals of Neural Networks

  • Neural network architecture and components
  • Activation functions and their properties
  • Forward propagation
  • Loss functions and optimization
  • Lecture Slides - Worksheet

Lecture 22: Backpropagation & Training

  • Backpropagation algorithm
  • Gradient descent and optimization
  • Weight initialization strategies
  • Training neural networks effectively
  • Lecture Slides - Worksheet

Lecture 23: Advanced Neural Networks

  • Convolutional Neural Networks (CNNs)
  • Recurrent Neural Networks (RNNs)
  • Transfer learning and pre-trained models
  • Real-world applications
  • Lecture Slides - Worksheet

Key Concepts

  • Neural Networks: Multi-layer perceptrons for complex pattern recognition
  • Backpropagation: Algorithm for training neural networks
  • Activation Functions: Non-linear transformations (ReLU, sigmoid, tanh)
  • Loss Functions: Measuring prediction error
  • Optimization: Gradient descent and its variants
  • Deep Learning: Neural networks with multiple hidden layers

Applications

  • Computer Vision: Image classification and object detection
  • Natural Language Processing: Text classification and generation
  • Speech Recognition: Audio processing and transcription
  • Recommendation Systems: Personalized content and product recommendations

Important Dates

  • Final Report Due: May 1
  • Final Exam: May 7

Assignments

  • Neural Network Implementation: Build and train neural networks
  • Final Project: Complete your final project
  • Final Exam Preparation: Review all course material

Course Wrap-up

This concludes our exploration of data science tools and applications. You now have a solid foundation in clustering, classification, regression, and neural networks!

Final Exam

The final exam will cover all topics from the semester, with emphasis on:

  • Understanding of algorithms and their applications
  • Model evaluation and comparison
  • Real-world problem-solving approaches
  • Practical implementation skills