# 17 – Autoencoders

In this lecture we will take a closer look at a form of neural network known as an Autoencoder. If time permits we will also take a look at an interesting variant on this theme known as Sparse Coding

Slides:

Reference: (* = you are responsible for this material)

## 6 thoughts on “17 – Autoencoders”

1. Stéphanie Larocque says:

It may be a trivial question, but there is something I don’t understand in Chapter 13 about linear factor models (including probabilistic PCA and factor analysis).

It says that these approaches are “building a probabilistic model of the input p_model(x)”. These models define a linear decoder function that generates $x$ by adding noise to a linear transformation of h (the latent variables).

The data generation process is :
1 – Sample explanatory factors h~p(h)
2 – Sample real-valued x = Wh+b+noise

I don’t understand how it could generate data similar to some input data while there is no “encoder” that describe which explanatory factors/latent variables are important for this distribution? I think I missed something, because I don’t understand how we can train these models.

Like

• Stéphanie Larocque says:

I think I finally have the answer to my (own) question.

The latent variables are trained with the maximum likelihood criterion over the entire dataset. With gradient descent, we maximize $\sum_{x\in data} p(x|h)$ so the latent variables and the model are trained to fit well the training data.

I was confused with the task of the linear factor models (building a probabilistic distribution of the input) compare to autoencoders (trained to copy its input to output while learning useful features).

Like

• Stéphanie Larocque says:

Note : As seen in class, the correct answer would be : “we maximize $\sum_{x\in data} p(x)$ ” (instead of maximizing p(x|h) )

Like

• I don’t know if I have understood this point entirely.

You say that you maximize $\sum_{x\in data} p(x)$ but how the latent variable interfere with this optimization procedure ?

Your model start from h and reconstructs the input x, so it computes p(x|h). Then how do you link p(x) and p(x|h) ?

Like

• For pure autoencoders (i.e. not VAEs), we don’t start with h – exactly because we have no prior for it.

Like

• The way I understood it is the following.

We are using latent variables to allow us to figure out $p(x)$. We run optimization for the maximum likelihood on $p(x)$ which will in turn affect our latent variables. Hoping this can help someone out, otherwise sorry if it doesn’t make sense 🙂

Like