Linear Regression

1. What is Linear Regression?
Linear Regression is one of the simplest and most fundamental algorithms in machine learning and statistics. It models the relationship between:
Input features (x)
A continuous output (y)
by fitting a linear equation to the obserced data.
At it core, linear regression anwers this question:
“How does the output change when the input changes?”
2. Why is Linear Regression important?
Even though it is simple, linear regression is crucial because:
It is the foundation of many advanced ML models
It teaches:
Loss functions
Optimization
Gradient descent
Bias–variance tradeoff
Many real-world systems still use it:
Forecasting
Trend analysis
Sensor calibration
Embedded systems & control
3. Intuition: The Best-Fit Line
Imagine we have data points like this:

Each point represents (x, y)
Liner regression tries to find a straight line such that:
The line as close as possible to all points.
Errors are minimized overall and not individually
4. Simple Linear Regression (One Feature)
4.1 Model Equation
For a single feature:
$$y=mx+c$$
In ML notation:
$$\hat{y} = wx + b$$
Where:
x→ input featurew→ weight (slope)b→ bias (intercept)ŷ→ predicted value
4.2 What Do w and b Mean?
Weight (w)
Controls steepness
How strongly
xaffectsy
Bias (b)
Vertical shift
Value of
ywhenx = 0
5. From Geometry to Math
Each data point produces an error:
$$error_i = y_i - \hat{y}_i$$
Linear regression does not minimize raw error. It minimizes squared error.
6. Loss Function (Cost Function)
Mean Squared Error (MSE)
$$J(w, b) = \frac{1}{n} \sum_{i=1}^{n} (y_i - (w x_i + b))^2$$
Why squared?
Penalizes large errors
Differentiable
Convex (single global minimum)

7. Optimization Goal
The ML problem becomes:
$$\min_{w,b} J(w, b)$$
This means: Find values of w and b that minimize prediction error.
8. Multiple Linear Regression (Multiple Features)
Model Equation
$$\hat{y} = w_1 x_1 + w_2 x_2 + ... + w_n x_n + b$$
Or vectorized:
$$\hat{y} = \mathbf{w}^T \mathbf{x} + b$$
Geometric Interpretation
Single feature → line
Two features → plane
N features → hyperplane

9. Implementation (Python)
import numpy as np
# Data
X = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([2, 4, 5, 4, 5], dtype=float)
# Initialize parameters
w = 0.0
b = 0.0
# Hyperparameters
lr = 0.01
n = len(X)
epochs = 1000
# Gradient Descent
for _ in range(epochs):
y_pred = w * X + b
dw = (-2 / n) * np.sum(X * (y - y_pred))
db = (-2 / n) * np.sum(y - y_pred)
w -= lr * dw
b -= lr * db
print("Learned w (slope):", w)
print("Learned b (bias) :", b)
Output:
Learned w (slope): 0.6176946148762643
Learned b (bias) : 2.136116825825789
10. Using scikit-learn
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X.reshape(-1,1), y)
print(model.coef_)
print(model.intercept_)
Output:
w (coef): 0.6
b (intercept): 2.2
11. Where Linear Regression Fails
Non-linear relationships
Outliers
High-dimensional sparse data
Strong feature correlation
12. Why Linear Regression Still Matters
Interpretable
Fast
Stable
Foundational for:
Logistic Regression
Neural Networks
Kalman Filters
Control Systems
13. Final Takeaway
Even in deep-learning-dominated ML stacks:
Linear regression remains essential
Understanding it deeply makes every other ML concept easier




