25 – Undirected Generative Models

In this last lecture, we will discuss undirected generative models. Specifically we will look at the Restricted Boltzmann Machine and (to the extent that time permits) the Deep Boltzmann Machine.

Slides:

Reference: (* = you are responsible for this material)

  • *Sections 20.1 to 20.4.4 (inclusively) of the Deep Learning textbook.
  • Sections 17.3-17.4 (MCMC, Gibbs), chap. 19 (Approximate Inference) of the Deep Learning textbook.

Q13 – Activation Functions II

Contributed by Pulkit Khandelwal.

Consider a neural network as shown in the Figure below. The network has linear activation functions. Let the various weights be defined as shown in the figure and also the output of each unit is multiplied by some constant k.

Answer the following questions:

  1. Re-design the neural network to compute the same function without using any hidden units. Express the new weights in terms of the old weights. Draw the obtained perceptron.
  2. Can the space of functions that is represented by the above artificial neural network also be represented by linear regression?
  3. Is it always possible to express a neural network made up of only linear units without a hidden layer? Give a brief justification.
  4. Let the hidden units use sigmoid activation functions and let the output unit use a threshold activation function. Find weights which cause this network to compute the XOR of X_{1} and X_{2} for binary-valued X_{1} and X_{2}. Assume that there are no bias terms.

Q12 – Function Representation and Network Capacity

Contributed by Pulkit Khandelwal.

Let us say that we are given two types of activation functions: linear and a hard threshold function as stated below:

  • Linear:  y = w_{0} + \sum_{i}w_{i}x_{i}
  • Hard Threshold:  y=\left\{  \begin{array}{@{}ll@{}}  1, & \text{if}\ w_{0} + \sum_{i}w_{i}x_{i} \geq 0 \\  0, & \text{otherwise}  \end{array}\right.

Which of the following can be exactly represented by a neural network with one hidden layer? You can use linear and/or threshold activation functions. Justify your answer with a brief explanation.

  1. polynomials of degree 2
  2. polynomials of degree 1
  3. hinge loss
  4. piecewise constant functions

Q10 – Backpropagation

Contributed by Matthew Zak.

  1. Create a very simple graph(circuit) given with f(x_1,x_2,x_3,x_4)=x_1x_2+x_3x_4 and compute all the derivatives of f with respect to inputs (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\frac{\partial f}{\partial x_3},\frac{\partial f}{\partial x_4}) using a chain rule (\frac{\partial f}{\partial x}=\frac{\partial f}{\partial q}\frac{\partial q}{\partial x}).
  2. Show how will the gradient of f with respect to x_1 change when we increase the input x_2 by \Delta h.
  3. Having a function g(f(x1, x2, x3, x4)) where f is given by the function above and g(t) = \sigma(t) is is a sigmoid function, compute the derivative of g with respect to input x_1(\frac{\partial g}{\partial x_1}).