We collect pairs (x, y). Example: hours studied (x) and test score (y). The points form a cloud. We want the line that runs through the middle so we can predict y for new x.
Input x → what you know.
Output y → what you predict.
Pattern → often straight!
MODEL
The line (model)
Our model is y = m·x + b. Here m is slope (how fast y changes as x increases), and b is intercept (y when x=0).
Slope m
If x increases by 1, y changes by m.
Intercept b
When x = 0, we predict y = b.
MATH
Error (Loss)
For each point: prediction ŷ = m·x + b, error e = ŷ − y. We square and average:
MSE = average( (m·x + b − y)2 )
Smaller MSE ⇒ better line.
LEARN
Gradient Descent
m ← m − α · (2/n) · Σ( (m·x + b − y)·x )
b ← b − α · (2/n) · Σ( (m·x + b − y) )
α is the learning rate (step size).
TRY
Choose a dataset
Seed: 7
Tune the line
m (slope): 0.000
b (intercept): 0.000
α (learning rate): 0.010
m
0.000
b
0.000
MSE (now)
0.0000
Best m
0.000
Best b
0.000
Steps
0
Data & Lines
PointsYour LineBest Fit
MSE (your line): 0.0000
Learning (Loss over steps)
MSE
NEXT
Beyond straight lines
Multiple features: y = b + m₁x₁ + m₂x₂ + …
Regularisation: keep the line simple (Ridge/Lasso).
Non-linear patterns: add x², or switch models.
FORMULA
Best-Fit Line
m = cov(x,y) / var(x) b = ȳ − m·x̄
Hover/Click to flip
Gradient Descent
Start m=0, b=0
Compute gradients
Update
Repeat ✨
GLOSSARY
Feature (x): input
Target (y): output
Loss: wrongness
Model: formula
Mini Quiz 🎯
If m=2, b=3, x=4 → y = 2·4 + 3 = 11
TRY
If you study 3 hours, what score? Double the hours—how much does it change?
You’ve got this! 🌈
Learning is like gradient descent: small steps → big magic.