Artificial Intelligence : Notes
  • Supervised Learning
    • Trees
      • AdaBoost
      • ID3
      • Random Forests
    • Convolutional Neural Networks
    • DNN for Classification
      • Binary classification
      • Multiple classes
    • K-Nearest Neighbors
    • LDA
    • Logistic Regression
    • Perceptron
    • QDA
    • SVM
  • Unsupervised Learning
    • DBSCAN
    • Deep Autoencoder
    • Generative Adversarial Networks (GAN)
    • K-Means Clustering
    • Linear Regression
    • Principal Component Analysis (PCA)
    • Restricted Boltzmann Machines (RBM)
  • Reinforcement Learning
    • Markov Decision Process
    • Q-Learning
    • Deep Q-Learning
  • Ensemble Strategies
    • Ensemble Learning
    • Fine-tuning and resampling
  • Other Techniques
    • Expectation-Maximization
    • Recurrent Neural Networks

Deep Neural Networks for Classification

Deep neural networks can be used for classification. The last activation function is usually a softmax, i.e. for a value z=(zi)i it is defined as:

σ(z)=(exp(zi)∑k=1Kexp(zk))i

Binary classification

One typically uses the binary cross-entropy function as the loss function to train the neural network.

  • In the discrete case, the loss function becomes
−[y⋅log⁡(p)+(1−y)⋅log⁡(1−p)]
  • In the continuous case, when dealing with probabilities predicted by a neural network that are not strictly 0 or 1, the cross-entropy loss for binary classification is still used. The formula remains the same, but it is interpreted as a measure of dissimilarity between the predicted probability distribution and the true binary labels.

Multiple classes

For the case of multiple classes, the extension of binary cross-entropy loss is the categorical cross-entropy loss. The categorical cross-entropy loss is used when dealing with classification problems involving more than two classes. The formula for categorical cross-entropy loss is as follows:

−∑c=1Cyc⋅log⁡(pc)

where:

  • yc is an indicator function (1 if the actual class is c, 0 otherwise)
  • pc​ is the predicted probability that the instance belongs to class c

The goal during training is to minimize this average categorical cross-entropy loss across all instances in the training dataset.

Prev
Convolutional Neural Networks
Next
K-Nearest Neighbors