17 – Autoencoders

In this lecture we will take a closer look at a form of neural network known as an Autoencoder. If time permits we will also take a look at an interesting variant on this theme known as Sparse Coding

Slides:

Reference: (* = you are responsible for this material)

Advertisements

6 thoughts on “17 – Autoencoders

  1. It may be a trivial question, but there is something I don’t understand in Chapter 13 about linear factor models (including probabilistic PCA and factor analysis).

    It says that these approaches are “building a probabilistic model of the input p_model(x)”. These models define a linear decoder function that generates $x$ by adding noise to a linear transformation of h (the latent variables).

    The data generation process is :
    1 – Sample explanatory factors h~p(h)
    2 – Sample real-valued x = Wh+b+noise

    I don’t understand how it could generate data similar to some input data while there is no “encoder” that describe which explanatory factors/latent variables are important for this distribution? I think I missed something, because I don’t understand how we can train these models.

    Like

    • I think I finally have the answer to my (own) question.

      The latent variables are trained with the maximum likelihood criterion over the entire dataset. With gradient descent, we maximize $\sum_{x\in data} p(x|h)$ so the latent variables and the model are trained to fit well the training data.

      I was confused with the task of the linear factor models (building a probabilistic distribution of the input) compare to autoencoders (trained to copy its input to output while learning useful features).

      Like

      • Note : As seen in class, the correct answer would be : “we maximize $\sum_{x\in data} p(x)$ ” (instead of maximizing p(x|h) )

        Like

      • I don’t know if I have understood this point entirely.

        You say that you maximize $\sum_{x\in data} p(x)$ but how the latent variable interfere with this optimization procedure ?

        Your model start from h and reconstructs the input x, so it computes p(x|h). Then how do you link p(x) and p(x|h) ?

        Like

      • The way I understood it is the following.

        We are using latent variables to allow us to figure out p(x). We run optimization for the maximum likelihood on p(x) which will in turn affect our latent variables. Hoping this can help someone out, otherwise sorry if it doesn’t make sense 🙂

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s