Linear Regression

Model

Given a sample $((x_{i}, y_{i}))$ , try to fit it with the equation:

Y = β_{0} + β_{1} X_{1} + \dots + β_{p} X_{p} + ε

where $E (ε) = 0$ . The Residual Standard Error (RSE) is an estimate of the standard deviation of $ε$ and is given by:

RSE = \sqrt{\frac{1}{n - 2} \sum_{k = 1}^{n} (y_{k} - {\hat{y}}_{k})^{2}}

The estimators of the $β_{i}$ 's are found my minimizing the RSE, which can be thought of as a "lack of fit". Other errors can be used by adding e.g. a penalty on the $L^{1}$ (LASSO) or $L^{2}$ (Ridge) norm of $β = (β_{0}, \dots, β_{n})$ .

Algorithm

Standard Linear Regression (without further penalization) has an explicit solution:

\hat{β} = {(X X^{T})}^{- 1} X^{T} Y

Computing the ${(X X^{T})}^{- 1}$ can cost a lot if $X$ is big but it can be parallelized on a GPU.

Pros

Linear Regression

Interpretability: Coefficients represent the relationship between independent and dependent variables.
Simplicity: Easy to implement and understand.
Computationally efficient: Training is faster compared to more complex models.
Well-suited for linear relationships: Works well when the relationship between variables is approximately linear.

Ridge Regression

Handles multicollinearity: Regularization term mitigates the impact of correlated predictors.
Stable solutions: Less sensitive to changes in input data.
Reduces model complexity: Helps prevent overfitting.

Lasso Regression:

Feature selection: Encourages sparsity, leading to automatic variable selection.
Simplicity and interpretability: Simplifies the model by setting some coefficients to zero.
Handles multicollinearity: Can be used when predictors are highly correlated.

Cons

Linear Regression

Assumes linearity (not really): Might not capture complex, non-linear relationships in the data.
Sensitive to outliers: Outliers can disproportionately affect the model.
Assumes independence of errors: Violation of assumptions can lead to inaccurate results.
Limited in handling multicollinearity: Struggles when predictors are highly correlated.

Ridge Regression

Not feature selection: Ridge regression includes all features; it won't eliminate irrelevant predictors.
Limited interpretability: Coefficients may be harder to interpret.

Lasso Regression:

Unstable with correlated predictors: May arbitrarily select one and ignore others.
Not robust to outliers: Sensitive to extreme values.
May shrink coefficients to zero: This can be an issue if all features are relevant.