Q3 – Reparameterization Trick of Variational Autoencoder

Contributed by Chin-Wei Huang.

Consider a generative model that factorizes as follows p(x,z) = p(x|z)p(z), where p(x|z) is mapped through a neural net, i.e. p(x|z) = p(x;h_\theta(z)), \theta being the set of parameters for the generative network (i.e. decoder), a simple distribution parameterized by h(\cdot) such as Gaussian or Bernoulli (i.e. p(x|z) = \prod_j p(x_j|z)). In the case of Gaussian, h_\theta(z) refers to the mean and variance, per dimension as it is fully factorized in the common setting. We have z\in\mathbf{R}^K, which implies a continuous latent space model, and p(z)=\mathcal{N}(0,I_K). The framework of auto-encoding variational Bayes considers maximizing the variational lower bound on the log-likelihood \mathcal{L}(\theta,\phi)\leq \log p(x), which is expressed as

\mathcal{L}(\theta,\phi) = \mathbf{E}_{q_\phi}[\log p(x|z)] - \mathbf{KL}(q_\phi(z|x)||p_\theta(z)),

where \phi is the set of parameters used for the inference network (i.e. encoder). The reparameterization trick used in the original work rewrites the random variable in the variational distribution as

z = \mu(x) + \sigma(x)\odot\epsilon               (1)

where \epsilon\sim\mathcal{N}(\epsilon;0,I), so that gradient can be backpropagated through the stochastic bottleneck.

  1. Prove that the samples drawn from the linearly transformation of Gaussian noise (1) has the same mean and variance as \mathcal{N}(z;\mu(x),\sigma(x)). What if we write z=\mu(x)+S(x)\epsilon, where S(x)\in\mathbf{R}^{K\times K} could be a reshaped $K^2$ dimensional output of a neural net? Comment on the new distribution this reparameterization induces.
  2. If the full covariance variational distribution, i.e. with z=\mu(x)+S(x)\epsilon, is used, derive the second term of the lower bound \mathbf{KL}(q_\phi(z|x)||p_\theta(z)).
  3. If the traditional mean field variational method is used, i.e. if we factorize the variational distribution as a product of distributions: q^{mf}(z_i) = \prod_j \mathcal{N}(z_{i,j}|m_{i,j},\sigma^2_{i,j}) and we maximize the lower bound with respect to the variational parameters and model parameters iteratively, can the inference network used in the variational autoencoder q_\phi (1) outperform the mean field method? What is the advantage of using an encoder as in VAE?

2 thoughts on “Q3 – Reparameterization Trick of Variational Autoencoder

  1. I’m not entirely sure about #3: At first glance I’d say that in an ideal settings, both methods would yield the same performance, with the VAE approach being much easier to train because you can backpropagate through the stochastic variable and thus can simply use gradient descent. Am I missing something?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s