Artificial Intelligence : Notes
  • Supervised Learning
    • Trees
      • AdaBoost
      • ID3
      • Random Forests
    • Convolutional Neural Networks
    • DNN for Classification
    • K-Nearest Neighbors
    • LDA
    • Logistic Regression
    • Perceptron
    • QDA
    • SVM
  • Unsupervised Learning
    • DBSCAN
    • Deep Autoencoder
    • Generative Adversarial Networks (GAN)
    • K-Means Clustering
    • Linear Regression
      • Model
      • Algorithm
      • Pros
        • Linear Regression
        • Ridge Regression
        • Lasso Regression:
      • Cons
        • Linear Regression
        • Ridge Regression
        • Lasso Regression:
    • Principal Component Analysis (PCA)
    • Restricted Boltzmann Machines (RBM)
  • Reinforcement Learning
    • Markov Decision Process
    • Q-Learning
    • Deep Q-Learning
  • Ensemble Strategies
    • Ensemble Learning
    • Fine-tuning and resampling
  • Other Techniques
    • Expectation-Maximization
    • Recurrent Neural Networks

Linear Regression

Model

Given a sample ((xi,yi)), try to fit it with the equation:

Y=β0+β1X1+…+βpXp+ε

where E(ε)=0. The Residual Standard Error (RSE) is an estimate of the standard deviation of ε and is given by:

RSE=1n−2∑k=1n(yk−y^k)2

The estimators of the βi's are found my minimizing the RSE, which can be thought of as a "lack of fit". Other errors can be used by adding e.g. a penalty on the L1 (LASSO) or L2 (Ridge) norm of β=(β0,…,βn).

Algorithm

Standard Linear Regression (without further penalization) has an explicit solution:

β^=(XXT)−1XTY

Computing the (XXT)−1 can cost a lot if X is big but it can be parallelized on a GPU.

Pros

Linear Regression

  • Interpretability: Coefficients represent the relationship between independent and dependent variables.
  • Simplicity: Easy to implement and understand.
  • Computationally efficient: Training is faster compared to more complex models.
  • Well-suited for linear relationships: Works well when the relationship between variables is approximately linear.

Ridge Regression

  • Handles multicollinearity: Regularization term mitigates the impact of correlated predictors.
  • Stable solutions: Less sensitive to changes in input data.
  • Reduces model complexity: Helps prevent overfitting.

Lasso Regression:

  • Feature selection: Encourages sparsity, leading to automatic variable selection.
  • Simplicity and interpretability: Simplifies the model by setting some coefficients to zero.
  • Handles multicollinearity: Can be used when predictors are highly correlated.

Cons

Linear Regression

  • Assumes linearity (not really): Might not capture complex, non-linear relationships in the data.
  • Sensitive to outliers: Outliers can disproportionately affect the model.
  • Assumes independence of errors: Violation of assumptions can lead to inaccurate results.
  • Limited in handling multicollinearity: Struggles when predictors are highly correlated.

Ridge Regression

  • Not feature selection: Ridge regression includes all features; it won't eliminate irrelevant predictors.
  • Limited interpretability: Coefficients may be harder to interpret.

Lasso Regression:

  • Unstable with correlated predictors: May arbitrarily select one and ignore others.
  • Not robust to outliers: Sensitive to extreme values.
  • May shrink coefficients to zero: This can be an issue if all features are relevant.
Prev
K-Means Clustering
Next
Principal Component Analysis (PCA)