# Q15 – Softmax and Cross Entropy

The softmax function for $m$ classes is given by

$p_i = \frac{e^{x_i}}{\sum_{j=1}^m e^{x_j}} \text{ for } i = 1\ldots m$.

It transforms a vector $(x_i)$ of real values into a probability mass vector for a categorical distribution.  It is often used in conjunction with the cross-entropy loss
$L(x, y) = - \sum_{i=1}^m y_i \log p_i$

1. Find a simplified expression for $p_i$ when $k = 2$.
2. Differentiate $p_i$ with respect to $x_k$.
3. Differentiate $L$ with respect to $x_k$.
