#  Lab 8: Multiclass Logistic Regression

## Objective
In this lab, you will:

1. Generate a three-class dataset with three distinct cluster.
2. Train two logistic regression models:
    * One-vs-Rest (OvR) logistic regression.
    * Softmax (Multinomial) logistic regression.
3. Evaluate both models using classification scores and cross entropy loss.
4. Visualize decision boundaries and decision hyperplanes for each class.
5. Compare and analyze the performance and decision boundaries of both approaches. 


## Step 1: Generate a Three-Class Dataset
We’ll start by generating a three-class dataset with three distinct cluster.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

# Define centers for three clusters
centers = [[-3, -2], [1, 3], [1, -1]]
X, y = make_blobs(n_samples=1000, centers=centers, random_state=42)

# Apply the transformation to the entire dataset
transformation = [[0.4, 0.2], [0.2, 1.8]]
X_transformed = np.dot(X, transformation)

# Visualize the transformed dataset
plt.scatter(X_transformed[:, 0], X_transformed[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolor='k', s=50)
plt.xlabel("Transformed Feature 1")
plt.ylabel("Transformed Feature 2")
plt.title("Three-Class Dataset with Different Transformations")
plt.show()

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.3, random_state=42)


## Step 2: Train Logistic Regression Models with OvR and Softmax
We’ll train one logistic regression model with the OvR (one-vs-rest) approach and another with the Softmax (multinomial) approach.

In [None]:
from sklearn.linear_model import LogisticRegression

# TODO: Train One-vs-Rest Logistic Regression
ovr_model = ...

# TODO: Train Softmax (Multinomial) Logistic Regression
softmax_model = ...


## Step 3: Evaluate and Compare Performance

Evaluate the models using:
* Classification Scores: Accuracy, precision, recall, and F1-score.
* Cross-Entropy Loss

In [None]:
#  TODO: Classification scores for both models

In [None]:
# TODO: cross entropy loss for both models

## Step 4: Visualize Decision Boundaries and Hyperplanes
We’ll visualize the decision boundaries and hyperplanes for each model. 

In [None]:
def plot_decision_boundary(model, X, y, model_name):
    # Set up a grid to plot decision boundaries
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200), np.linspace(y_min, y_max, 200))

    # TODO:
    Z = ...
    
    plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=plt.cm.coolwarm, s=50)
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.title(f"Decision Boundary for {model_name}")
    plt.show()

# Plot decision boundaries for OvR and Softmax models
plot_decision_boundary(ovr_model, X_transformed, y, "One-vs-Rest (OvR)")
plot_decision_boundary(softmax_model, X_transformed, y, "Softmax (Multinomial)")


In [None]:
def plot_decision_hyperplanes(model, X, y, model_name):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx = np.linspace(x_min, x_max, 200)

    # TODO: Plot each class's decision hyperplane

    # Scatter plot of data points
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=plt.cm.coolwarm, s=50)
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.title(f"Decision Hyperplanes for {model_name}")
    plt.legend()
    plt.show()

# Plot decision hyperplanes for both models
plot_decision_hyperplanes(ovr_model, X_transformed, y, "One-vs-Rest (OvR)")
plot_decision_hyperplanes(softmax_model, X_transformed, y, "Softmax (Multinomial)")


## Step 5: Compare and Analyze the Results
Answer the following questions:
* Which model has better performance overall? Why do you think this is the case?
* Which model has a lower cross entropy loss, and what does this tell you about the models' probabilistic outputs?
* What differences do you observe between the OvR and Softmax approaches in the way decision boundaries and hyperplanes are constructed and positioned? Can you explain why these differences occur based on how each model optimizes for multi-class classification?

