Linear Regression
Model
Given a sample
where
The estimators of the
Algorithm
Standard Linear Regression (without further penalization) has an explicit solution:
Computing the
Pros
Linear Regression
- Interpretability: Coefficients represent the relationship between independent and dependent variables.
- Simplicity: Easy to implement and understand.
- Computationally efficient: Training is faster compared to more complex models.
- Well-suited for linear relationships: Works well when the relationship between variables is approximately linear.
Ridge Regression
- Handles multicollinearity: Regularization term mitigates the impact of correlated predictors.
- Stable solutions: Less sensitive to changes in input data.
- Reduces model complexity: Helps prevent overfitting.
Lasso Regression:
- Feature selection: Encourages sparsity, leading to automatic variable selection.
- Simplicity and interpretability: Simplifies the model by setting some coefficients to zero.
- Handles multicollinearity: Can be used when predictors are highly correlated.
Cons
Linear Regression
- Assumes linearity (not really): Might not capture complex, non-linear relationships in the data.
- Sensitive to outliers: Outliers can disproportionately affect the model.
- Assumes independence of errors: Violation of assumptions can lead to inaccurate results.
- Limited in handling multicollinearity: Struggles when predictors are highly correlated.
Ridge Regression
- Not feature selection: Ridge regression includes all features; it won't eliminate irrelevant predictors.
- Limited interpretability: Coefficients may be harder to interpret.
Lasso Regression:
- Unstable with correlated predictors: May arbitrarily select one and ignore others.
- Not robust to outliers: Sensitive to extreme values.
- May shrink coefficients to zero: This can be an issue if all features are relevant.