Artificial Intelligence : Notes
  • Supervised Learning
    • Trees
      • AdaBoost
      • ID3
      • Random Forests
    • Convolutional Neural Networks
    • DNN for Classification
    • K-Nearest Neighbors
    • LDA
    • Logistic Regression
    • Perceptron
    • QDA
    • SVM
  • Unsupervised Learning
    • DBSCAN
    • Deep Autoencoder
    • Generative Adversarial Networks (GAN)
    • K-Means Clustering
      • Algorithm
      • Pros
      • Cons
      • See also
    • Linear Regression
    • Principal Component Analysis (PCA)
    • Restricted Boltzmann Machines (RBM)
  • Reinforcement Learning
    • Markov Decision Process
    • Q-Learning
    • Deep Q-Learning
  • Ensemble Strategies
    • Ensemble Learning
    • Fine-tuning and resampling
  • Other Techniques
    • Expectation-Maximization
    • Recurrent Neural Networks

K-Means

  • Clustering algorithm
  • Number of clusters is a hyper-parameter

Algorithm

  • Start with K random centroids (points in Rp)
  • Repeat until convergence:
    • Assign each point of the data to its closest centroid
    • Replace current centroids with the mean of the points in the corresponding cluster

kmeans.gif | center

source: https://www.maartengrootendorst.com/assets/images/posts/2019-07-30-customer/kmeans.gif

Pros

  • Easy to implement
  • Easy to interpret

Cons

  • Susceptible to correlated features
  • Susceptible to features with different variances

See also

  • LDA (Linear Discriminant Analysis) for supervised clustering
  • PCA (Principal Component Analysis) to decorrelate and normalize features
Prev
Generative Adversarial Networks (GAN)
Next
Linear Regression