Course Schedule
Introduction to Data Science
Topic 1: Introduction to Data Science
Overview
This topic introduces the fundamental concepts of data science and sets up the tools we’ll use throughout the semester.
Learning Objectives
- Understand the data science workflow and methodology
- Learn about key tools and technologies in the field
- Set up your development environment
- Explore real-world data science applications
Lectures
Lecture 1: Course Introduction & Data Science Workflow
- Course overview and expectations
- Data Science Process: CRISP-DM methodology
- Problem definition → Data collection → Data cleaning → Analysis → Modeling → Evaluation → Deployment
- Real-world data science applications
Lecture 2: Tools & Environment Setup
- Python ecosystem introduction: pandas, numpy, matplotlib, scikit-learn
- Jupyter Notebooks: Interactive development environment
- Environment setup and configuration
- Basic pandas and matplotlib exercises
Key Concepts
- Data Science Process: Systematic approach to solving data problems
- CRISP-DM: Cross-Industry Standard Process for Data Mining
- Python Ecosystem: Core libraries for data science
- Jupyter Notebooks: Interactive development environment
Required Reading
- Course syllabus and policies
- Python basics review (if needed)
- Introduction to pandas documentation
Assignments
- Setup: Install Python, Jupyter, and required packages
- Tutorial: Complete basic pandas and matplotlib exercises
- Discussion: Share data science project ideas
Resources
Next Topic
We’ll dive into clustering algorithms and begin working with real datasets.
Clustering
Topic 2: Clustering
Overview
This topic covers fundamental clustering algorithms and techniques for grouping similar data points together.
Learning Objectives
- Understand distance and similarity measures
- Learn K-means and K-means++ algorithms
- Explore hierarchical and density-based clustering
- Apply clustering to real-world datasets
Lectures
Lecture 3: Distance & Similarity
- Euclidean, Manhattan, and cosine distance
- Similarity measures and their applications
- Choosing appropriate distance metrics
- Lecture Slides - Worksheet
Lecture 4: K-means Clustering
- K-means algorithm and convergence
- Choosing the number of clusters (k)
- Initialization strategies
- Lecture Slides - Worksheet
Lecture 5: K-means++ & Advanced Initialization
- K-means++ algorithm
- Smart initialization strategies
- Avoiding poor local optima
- Lecture Slides - Worksheet
Lecture 6: Hierarchical Clustering
- Agglomerative and divisive approaches
- Linkage methods (single, complete, average)
- Dendrogram interpretation
- Lecture Slides - Worksheet
Lecture 7: Density-Based Clustering
- DBSCAN algorithm
- Density-based vs. centroid-based approaches
- Handling noise and outliers
- Lecture Slides - Worksheet
Lecture 8: Soft Clustering & Aggregation
- Fuzzy clustering approaches
- Combining multiple clustering results
- Ensemble clustering methods
- Lecture Slides + Clustering Aggregation - Worksheet
Key Concepts
- Distance Metrics: Euclidean, Manhattan, cosine distance
- K-means: Iterative centroid-based clustering
- Hierarchical Clustering: Tree-based clustering approach
- DBSCAN: Density-based clustering for arbitrary shapes
- Ensemble Methods: Combining multiple clustering results
Assignments
- Project Proposal Due: Submit your final project proposal
- Clustering Exercises: Complete worksheets for each algorithm
- Real-world Application: Apply clustering to a dataset of your choice
Important Dates
- Project Proposal: Due during this topic
Next Topic
We’ll explore Singular Value Decomposition (SVD) for dimensionality reduction and feature extraction.
Singular Value Decomposition
Topic 3: Singular Value Decomposition
Overview
This topic covers Singular Value Decomposition (SVD), a powerful technique for dimensionality reduction, feature extraction, and data compression.
Learning Objectives
- Understand the mathematical foundations of SVD
- Apply SVD for dimensionality reduction
- Use SVD for feature extraction and data compression
- Implement SVD in real-world applications
Lectures
Lecture 9: Singular Value Decomposition Fundamentals
- Mathematical foundations of SVD
- Matrix decomposition: A = UΣV^T
- Understanding singular values and their importance
- Lecture Slides - Worksheet
Lecture 10: SVD Applications & Implementation
- Dimensionality reduction with SVD
- Feature extraction and compression
- Principal Component Analysis (PCA) connection
- Worksheet
Key Concepts
- SVD Decomposition: A = UΣV^T where U and V are orthogonal, Σ is diagonal
- Singular Values: Measure of importance of each component
- Dimensionality Reduction: Reducing features while preserving information
- Feature Extraction: Finding meaningful patterns in data
- PCA Connection: SVD is the foundation of Principal Component Analysis
Applications
- Image Compression: Reducing image dimensions while preserving quality
- Text Analysis: Latent Semantic Analysis (LSA) for document similarity
- Recommendation Systems: Matrix factorization for collaborative filtering
- Noise Reduction: Removing noise from data using low-rank approximation
Assignments
- SVD Implementation: Complete SVD exercises and applications
- Dimensionality Reduction: Apply SVD to reduce dataset dimensions
- Real-world Application: Use SVD for image compression or text analysis
Next Topic
We’ll begin exploring classification algorithms, starting with K-Nearest Neighbors and decision trees.
Classification
Topic 4: Classification
Overview
This topic covers fundamental classification algorithms and techniques for predicting categorical outcomes from data.
Learning Objectives
- Understand K-Nearest Neighbors and decision trees
- Learn Naive Bayes and Support Vector Machines
- Master model evaluation and ensemble methods
- Apply classification to real-world problems
Lectures
Lecture 11: Introduction to Classification & K-Nearest Neighbors
- Classification problem formulation
- K-Nearest Neighbors algorithm
- Distance metrics for classification
- Model evaluation basics
- Lecture Slides - Worksheet
Lecture 12: Decision Trees
- Decision tree construction
- Information gain and entropy
- Tree pruning and overfitting
- Interpretable machine learning
- Lecture Slides - Worksheet
Lecture 13: Naive Bayes & Model Evaluation
- Naive Bayes algorithm
- Probability and Bayes’ theorem
- Model evaluation metrics (accuracy, precision, recall, F1)
- Ensemble methods introduction
- Lecture Slides + Model Evaluation - Worksheet
Lecture 14: Support Vector Machines
- Linear and non-linear SVM
- Kernel functions and feature spaces
- Margin maximization
- SVM for classification and regression
- Lecture Slides - Worksheet
Lecture 15: Recommender Systems & Midterm Launch
- Collaborative filtering
- Content-based recommendation
- Matrix factorization approaches
- Midterm 2 competition launch
- Lecture Slides
Key Concepts
- Classification: Predicting categorical outcomes
- K-Nearest Neighbors: Instance-based learning
- Decision Trees: Rule-based classification
- Naive Bayes: Probabilistic classification
- Support Vector Machines: Margin-based classification
- Model Evaluation: Accuracy, precision, recall, F1-score
Important Dates
- Midterm Report Due: March 31
- Midterm 2 Launch: April 2
Assignments
- Classification Exercises: Complete worksheets for each algorithm
- Model Comparison: Compare different classification algorithms
- Midterm Report: Submit your midterm project report
Next Topic
We’ll explore regression algorithms, starting with linear regression and model evaluation.
Regression
Topic 5: Regression
Overview
This topic covers regression algorithms for predicting continuous and categorical outcomes, including linear and logistic regression.
Learning Objectives
- Understand linear regression fundamentals
- Learn logistic regression for classification
- Master regression model evaluation
- Apply regression to real-world problems
Lectures
Lecture 16: Linear Regression
- Linear regression fundamentals
- Ordinary Least Squares (OLS)
- Assumptions and diagnostics
- Feature engineering for regression
- Lecture Slides - Worksheet
Lecture 17: Linear Model Evaluation
- Model evaluation metrics (R², RMSE, MAE)
- Residual analysis and diagnostics
- Cross-validation for regression
- Feature importance and interpretation
- Lecture Slides - Worksheet
Lecture 18: Advanced Linear Model Evaluation
- Multicollinearity and feature selection
- Regularization (Ridge, Lasso)
- Model comparison and selection
- Real-world regression applications
- Lecture Slides - Worksheet
Lecture 19: Logistic Regression
- Logistic regression fundamentals
- Binary and multiclass classification
- Odds ratios and interpretation
- Model evaluation for classification
- Lecture Slides - Worksheet
Lecture 20: Logistic Regression Continued
- Advanced logistic regression topics
- Feature engineering for classification
- Model interpretation and explainability
- Real-world applications
- Lecture Slides - Worksheet
Key Concepts
- Linear Regression: Predicting continuous outcomes
- Logistic Regression: Predicting categorical outcomes
- Model Evaluation: R², RMSE, MAE for regression
- Regularization: Ridge and Lasso regression
- Feature Engineering: Creating meaningful features
- Model Interpretation: Understanding model predictions
Applications
- House Price Prediction: Using features to predict real estate prices
- Medical Diagnosis: Predicting disease presence from symptoms
- Marketing: Predicting customer behavior and preferences
- Finance: Predicting stock prices and market trends
Assignments
- Regression Exercises: Complete worksheets for linear and logistic regression
- Model Comparison: Compare different regression approaches
- Real-world Application: Apply regression to a dataset of your choice
Next Topic
We’ll explore neural networks and deep learning fundamentals.
Neural Networks
Topic 6: Neural Networks
Overview
This topic covers neural networks and deep learning fundamentals, including the backpropagation algorithm and modern neural network architectures.
Learning Objectives
- Understand neural network fundamentals
- Learn the backpropagation algorithm
- Explore modern neural network architectures
- Apply neural networks to real-world problems
Lectures
Lecture 21: Fundamentals of Neural Networks
- Neural network architecture and components
- Activation functions and their properties
- Forward propagation
- Loss functions and optimization
- Lecture Slides - Worksheet
Lecture 22: Backpropagation & Training
- Backpropagation algorithm
- Gradient descent and optimization
- Weight initialization strategies
- Training neural networks effectively
- Lecture Slides - Worksheet
Lecture 23: Advanced Neural Networks
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Transfer learning and pre-trained models
- Real-world applications
- Lecture Slides - Worksheet
Key Concepts
- Neural Networks: Multi-layer perceptrons for complex pattern recognition
- Backpropagation: Algorithm for training neural networks
- Activation Functions: Non-linear transformations (ReLU, sigmoid, tanh)
- Loss Functions: Measuring prediction error
- Optimization: Gradient descent and its variants
- Deep Learning: Neural networks with multiple hidden layers
Applications
- Computer Vision: Image classification and object detection
- Natural Language Processing: Text classification and generation
- Speech Recognition: Audio processing and transcription
- Recommendation Systems: Personalized content and product recommendations
Important Dates
- Final Report Due: May 1
- Final Exam: May 7
Assignments
- Neural Network Implementation: Build and train neural networks
- Final Project: Complete your final project
- Final Exam Preparation: Review all course material
Course Wrap-up
This concludes our exploration of data science tools and applications. You now have a solid foundation in clustering, classification, regression, and neural networks!
Final Exam
The final exam will cover all topics from the semester, with emphasis on:
- Understanding of algorithms and their applications
- Model evaluation and comparison
- Real-world problem-solving approaches
- Practical implementation skills