23 – Autoregressive Generative Models

In these lectures, we discuss autoregressive generative models such as NADE, MADE, PixelCNN, PixelRNN, and the PixelVAE.

Slides:

Reference: (* = you are responsible for this material)

4 thoughts on “23 – Autoregressive Generative Models

  1. Practical question: how do you actually plug in the captions?

    If you have a GAN where the generator creates an image based on a noise vector, the captions can be an input alongside the noise, similarly for the discriminator, it sees the image and the caption and evaluates accordingly.

    But how do you go from text to a vector/image? I’ve trained a char-level LSTM rNN, it outputs more text fine. Instead of outputting text, it needs to output a sort of embedding? What is an embedding anyway? A dictionary of words or word fragments + a conditional probability [0,1] based on previous inputs?

    Like

    • The hidden states of the RNN would be considered an embedding. What is commonly done is to take the last hidden state (which is a vector) and consider that to be a summary of what the RNN has seen. Alternatively you can attend to the sequence of hidden states using any of the schemes discussed in the last lecture.

      You can map the vector to an “image” by a linear mapping to a high-dimensional space and reshaping the resulting vector to have an image shape. E.g. if your vector has size 128 and you want to end up with a 64×64 image, you can use a 128×4096 weight matrix to go from 128 to 4096 and then reshape to 64×64. If you want multiple feature maps, say you want to map to 64x64x10, then use a 128×40960 matrix.

      Liked by 1 person

Leave a comment