Contributed by Chin-Wei Huang.
Consider a univariate-output Mixture Density Network, with specifying the number of latent variables: where are the set of class conditional parameters.
In the following questions, assume the “prior” probability distributions form a multinoulli distribution, parameterized by a softmax function (a one hidden-layer network) mapping from the input, i.e. .
- Suppose is continuous, and let . To do prediction, use the expected conditional as a point estimate of the output. Derive and .}
- Holding the class conditional parameters () fixed, derive a stochastic (i.e. for one data point) gradient ascent expression for the softmax weight parameters using maximum likelihood principle. (Hint: M-step of the EM algorithm).
- Now devise a prediction mapping function defined as , where generally is a MLP and is a prediction function depending on the input . Now let be a softmax regression of classes and be a set of linear mapping functions, i.e. . If we want to minimise the quadratic loss for each data point $n$, what is the gradient descent update expression for parameter if is fixed?
- Comment on the difference between the previous two training objectives.