As announced in class, the final exam will take place today, April 13th, 2017, in 1360 Pav. AndréAisenstadt at 9h0012h00 (three hours long). The exam will be closedbook, i.e. there will be no notes allowed. You are allowed a calculator.
Author: aaroncourville
25 – Undirected Generative Models
In this last lecture, we will discuss undirected generative models. Specifically we will look at the Restricted Boltzmann Machine and (to the extent that time permits) the Deep Boltzmann Machine.
Slides:
Reference: (* = you are responsible for this material)
 *Sections 20.1 to 20.4.4 (inclusively) of the Deep Learning textbook.
 Sections 17.317.4 (MCMC, Gibbs), chap. 19 (Approximate Inference) of the Deep Learning textbook.
New blog post on momentum
There is a great new Distill post on understanding moment:

Goh, "Why Momentum Really Works", Distill, 2017. http://doi.org/10.23915/distill.00006
Q13 – Activation Functions II
Contributed by Pulkit Khandelwal.
Consider a neural network as shown in the Figure below. The network has linear activation functions. Let the various weights be defined as shown in the figure and also the output of each unit is multiplied by some constant k.
Answer the following questions:
 Redesign the neural network to compute the same function without using any hidden units. Express the new weights in terms of the old weights. Draw the obtained perceptron.
 Can the space of functions that is represented by the above artificial neural network also be represented by linear regression?
 Is it always possible to express a neural network made up of only linear units without a hidden layer? Give a brief justification.
 Let the hidden units use sigmoid activation functions and let the output unit use a threshold activation function. Find weights which cause this network to compute the XOR of and for binaryvalued and . Assume that there are no bias terms.
Q12 – Function Representation and Network Capacity
Contributed by Pulkit Khandelwal.
Let us say that we are given two types of activation functions: linear and a hard threshold function as stated below:
 Linear:
 Hard Threshold:
Which of the following can be exactly represented by a neural network with one hidden layer? You can use linear and/or threshold activation functions. Justify your answer with a brief explanation.
 polynomials of degree 2
 polynomials of degree 1
 hinge loss
 piecewise constant functions
Q11 – Pooling in transpose convolution (deconvolution) layers
Contributed by Vasken Dermardiros
 What is the roll of pooling in convolutional layers?
 What would it result in upconvolution layers?
Q10 – Backpropagation
Contributed by Matthew Zak.
 Create a very simple graph(circuit) given with and compute all the derivatives of f with respect to inputs using a chain rule ().
 Show how will the gradient of with respect to change when we increase the input by .
 Having a function where is given by the function above and is is a sigmoid function, compute the derivative of with respect to input .