Q15 – Softmax and Cross Entropy

The softmax function for m classes is given by

p_i = \frac{e^{x_i}}{\sum_{j=1}^m e^{x_j}} \text{ for } i = 1\ldots m.

It transforms a vector (x_i) of real values into a probability mass vector for a categorical distribution.  It is often used in conjunction with the cross-entropy loss
L(x, y) = - \sum_{i=1}^m y_i \log p_i

  1. Find a simplified expression for p_i when k = 2.
  2. Differentiate p_i with respect to x_k.
  3. Differentiate L with respect to x_k.

5 thoughts on “Q15 – Softmax and Cross Entropy

Leave a comment