Deep Neural Networks for Classification

Deep neural networks can be used for classification. The last activation function is usually a softmax, i.e. for a value $z = (z_{i})_{i}$ it is defined as:

σ (z) = {(\frac{\exp (z_{i})}{\sum_{k = 1}^{K} \exp (z_{k})})}_{i}

Binary classification

One typically uses the binary cross-entropy function as the loss function to train the neural network.

In the discrete case, the loss function becomes

- [y \cdot \log (p) + (1 - y) \cdot \log (1 - p)]

In the continuous case, when dealing with probabilities predicted by a neural network that are not strictly 0 or 1, the cross-entropy loss for binary classification is still used. The formula remains the same, but it is interpreted as a measure of dissimilarity between the predicted probability distribution and the true binary labels.

For the case of multiple classes, the extension of binary cross-entropy loss is the categorical cross-entropy loss. The categorical cross-entropy loss is used when dealing with classification problems involving more than two classes. The formula for categorical cross-entropy loss is as follows:

- \sum_{c = 1}^{C} y_{c} \cdot \log (p_{c})

where:

$y_{c}$ is an indicator function ( $1$ if the actual class is $c$ , $0$ otherwise)
$p_{c}$ is the predicted probability that the instance belongs to class $c$

The goal during training is to minimize this average categorical cross-entropy loss across all instances in the training dataset.

Deep Neural Networks for Classification

Binary classification

Multiple classes