Artificial Intelligence : Notes
  • Supervised Learning
    • Trees
      • AdaBoost
      • ID3
      • Random Forests
    • Convolutional Neural Networks
    • DNN for Classification
    • K-Nearest Neighbors
    • LDA
    • Logistic Regression
    • Perceptron
    • QDA
    • SVM
  • Unsupervised Learning
    • DBSCAN
    • Deep Autoencoder
    • Generative Adversarial Networks (GAN)
    • K-Means Clustering
    • Linear Regression
    • Principal Component Analysis (PCA)
    • Restricted Boltzmann Machines (RBM)
  • Reinforcement Learning
    • Markov Decision Process
    • Q-Learning
    • Deep Q-Learning
  • Ensemble Strategies
    • Ensemble Learning
    • Fine-tuning and resampling
      • Resampling techniques
        • Validation set
        • Leave-One-Out Cross-Validation
        • -Fold Cross-Validation
  • Other Techniques
    • Expectation-Maximization
    • Recurrent Neural Networks

Fine-tuning and resampling

When assessing a machine learning model, it's common to divide the dataset into three parts: training set, validation set, and test set. Each set serves a specific purpose in training, fine-tuning, and evaluating the model, and this division helps ensure the model's performance generalizes well to unseen data. Here's why each set is important:

  1. Training Set:
    • Purpose: The training set is used to train the model. The model learns patterns, relationships, and features from this set.
    • Training Process: The model's parameters (weights and biases) are adjusted based on the input-output pairs in the training set.
    • Role: The training set helps the model to learn the underlying patterns in the data and build a representation that can make accurate predictions.
  2. Validation Set:
    • Purpose: The validation set is used during the training process to fine-tune the model and optimize hyperparameters.
    • Hyperparameter Tuning: As the model trains on the training set, its hyperparameters are adjusted based on how well it performs on the validation set. This helps prevent overfitting and ensures the model generalizes well to new, unseen data.
    • Role: The validation set serves as an unbiased evaluation during the model development phase, guiding adjustments to improve performance.
  3. Test Set:
    • Purpose: The test set is reserved for evaluating the model's performance after it has been trained and fine-tuned.
    • Unseen Data Evaluation: The test set contains data that the model has never seen during training or validation. It provides an unbiased assessment of how well the model generalizes to new, unseen data.
    • Role: The test set helps estimate the model's performance in real-world scenarios and ensures that the evaluation is not biased by the training or validation data.

Resampling techniques

Validation set

Split the dataset in two parts. Typically keep 60% to 80% to train the model.

Leave-One-Out Cross-Validation

Exclude one observation from the dataset and train the model on the remaining observations. Repeat for all observation and estimate the error by averaging.

loocv.png | center

source: #ISLP

k-Fold Cross-Validation

Shuffle then split the dataset into k folds (of size typically 5 to 10). Train the data by leaving out one fold. Repeat with the other folds and estimate the error by averaging.

k-folds.png | center | 550source: #ISLP

Prev
Ensemble Learning