Contributed by Chin-Wei Huang.
Consider a generative model that factorizes as follows , where is mapped through a neural net, i.e. , being the set of parameters for the generative network (i.e. decoder), a simple distribution parameterized by such as Gaussian or Bernoulli (i.e. ). In the case of Gaussian, refers to the mean and variance, per dimension as it is fully factorized in the common setting. We have , which implies a continuous latent space model, and . The framework of auto-encoding variational Bayes considers maximizing the variational lower bound on the log-likelihood , which is expressed as
where is the set of parameters used for the inference network (i.e. encoder). The reparameterization trick used in the original work rewrites the random variable in the variational distribution as
where , so that gradient can be backpropagated through the stochastic bottleneck.
- Prove that the samples drawn from the linearly transformation of Gaussian noise (1) has the same mean and variance as . What if we write , where could be a reshaped $K^2$ dimensional output of a neural net? Comment on the new distribution this reparameterization induces.
- If the full covariance variational distribution, i.e. with , is used, derive the second term of the lower bound .
- If the traditional mean field variational method is used, i.e. if we factorize the variational distribution as a product of distributions: and we maximize the lower bound with respect to the variational parameters and model parameters iteratively, can the inference network used in the variational autoencoder (1) outperform the mean field method? What is the advantage of using an encoder as in VAE?