Final Exam

As announced in class, the final exam will take place today, April 13th, 2017, in 1360 Pav. André-Aisenstadt at 9h00-12h00 (three hours long). The exam will be closed-book, i.e. there will be no notes allowed. You are allowed a calculator.

25 – Undirected Generative Models

In this last lecture, we will discuss undirected generative models. Specifically we will look at the Restricted Boltzmann Machine and (to the extent that time permits) the Deep Boltzmann Machine.

Slides:

Reference: (* = you are responsible for this material)

• *Sections 20.1 to 20.4.4 (inclusively) of the Deep Learning textbook.
• Sections 17.3-17.4 (MCMC, Gibbs), chap. 19 (Approximate Inference) of the Deep Learning textbook.

New blog post on momentum

There is a great new Distill post on understanding moment:

• Goh, "Why Momentum Really Works", Distill, 2017. http://doi.org/10.23915/distill.00006

Q13 – Activation Functions II

Contributed by Pulkit Khandelwal.

Consider a neural network as shown in the Figure below. The network has linear activation functions. Let the various weights be defined as shown in the figure and also the output of each unit is multiplied by some constant k.

1. Re-design the neural network to compute the same function without using any hidden units. Express the new weights in terms of the old weights. Draw the obtained perceptron.
2. Can the space of functions that is represented by the above artificial neural network also be represented by linear regression?
3. Is it always possible to express a neural network made up of only linear units without a hidden layer? Give a brief justification.
4. Let the hidden units use sigmoid activation functions and let the output unit use a threshold activation function. Find weights which cause this network to compute the XOR of $X_{1}$ and $X_{2}$ for binary-valued $X_{1}$ and $X_{2}$. Assume that there are no bias terms.

Q12 – Function Representation and Network Capacity

Contributed by Pulkit Khandelwal.

Let us say that we are given two types of activation functions: linear and a hard threshold function as stated below:

• Linear:  $y = w_{0} + \sum_{i}w_{i}x_{i}$
• Hard Threshold:  $y=\left\{ \begin{array}{@{}ll@{}} 1, & \text{if}\ w_{0} + \sum_{i}w_{i}x_{i} \geq 0 \\ 0, & \text{otherwise} \end{array}\right.$

Which of the following can be exactly represented by a neural network with one hidden layer? You can use linear and/or threshold activation functions. Justify your answer with a brief explanation.

1. polynomials of degree 2
2. polynomials of degree 1
3. hinge loss
4. piecewise constant functions

Q11 – Pooling in transpose convolution (deconvolution) layers

1. Create a very simple graph(circuit) given with $f(x_1,x_2,x_3,x_4)=x_1x_2+x_3x_4$ and compute all the derivatives of f with respect to inputs $(\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\frac{\partial f}{\partial x_3},\frac{\partial f}{\partial x_4})$ using a chain rule ($\frac{\partial f}{\partial x}=\frac{\partial f}{\partial q}\frac{\partial q}{\partial x}$).
2. Show how will the gradient of $f$ with respect to $x_1$ change when we increase the input $x_2$ by $\Delta h$.
3. Having a function $g(f(x1, x2, x3, x4))$ where $f$ is given by the function above and $g(t) = \sigma(t)$ is is a sigmoid function, compute the derivative of $g$ with respect to input $x_1(\frac{\partial g}{\partial x_1})$.