From Basic Concepts to Advanced Applications
Logistic Regression is a fundamental classification algorithm that helps us predict categorical outcomes. Think of it as a smart decision-maker that answers "yes" or "no" questions with probabilities.
Classifies data into two distinct categories (0 or 1, Yes or No, True or False). Perfect for spam detection, medical diagnosis, and credit scoring.
Instead of just giving a classification, it provides a probability score between 0 and 1, indicating confidence in the prediction.
Creates a boundary that separates different classes in the feature space, typically at probability = 0.5.
ฯ(z) = 1 / (1 + e-z)
This magical function converts any number into a probability between 0 and 1!
First, we calculate a linear combination of inputs: z = wโxโ + wโxโ + ... + b where w are weights, x are features, and b is bias.
The linear result is passed through the sigmoid function to squash it between 0 and 1.
We use Log Loss (Binary Cross-Entropy) to measure how wrong our predictions are.
J(ฮธ) = -1/m ฮฃ[yยทlog(ลท) + (1-y)ยทlog(1-ลท)]
Measures the error between predicted and actual values.
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def predict(features, weights, bias):
z = np.dot(features, weights) + bias
return sigmoid(z)
Explore how different parameters affect the decision boundary:
L1 (Lasso) and L2 (Ridge) regularization prevent overfitting by adding penalty terms to the cost function, controlling model complexity.
Extended to handle multiple classes using One-vs-Rest (OvR) or Softmax regression for mutually exclusive classes.
Beyond Gradient Descent: Adam, RMSprop, and LBFGS optimizers for faster convergence and better performance.
J(ฮธ) = -1/m ฮฃ[yยทlog(ลท) + (1-y)ยทlog(1-ลท)] + ฮป/(2m) ฮฃฮธโฑผยฒ
ฮป controls the strength of regularization.
class LogisticRegression:
def __init__(self, lr=0.01, epochs=1000, reg_lambda=0.1):
self.lr = lr
self.epochs = epochs
self.reg_lambda = reg_lambda
def fit(self, X, y):
# Gradient descent with L2 regularization
for epoch in range(self.epochs):
# Forward pass
z = np.dot(X, self.weights) + self.bias
predictions = sigmoid(z)
# Backward pass with regularization
dw = (1/m) * np.dot(X.T, (predictions - y)) + \
(self.reg_lambda/m) * self.weights
db = (1/m) * np.sum(predictions - y)
# Update parameters
self.weights -= self.lr * dw
self.bias -= self.lr * db
Receiver Operating Characteristic curve plots true positive rate vs false positive rate at various threshold settings.
Area Under the Curve measures model's ability to distinguish between classes. AUC = 0.5 is random, AUC = 1 is perfect.
Useful for imbalanced datasets where accuracy can be misleading.
Adjust the parameters below to see how they affect the classification probability:
Adjust parameters to achieve 95%+ probability for Class 1. Hint: Increase weights and adjust bias positively.
Find parameter settings that keep probability exactly at 50%. This represents maximum uncertainty.
Set Weight 1 to maximum positive and Weight 2 to maximum negative. Observe how conflicting features affect the decision.