# Q19 – Linear Regression

Consider a linear regression problem with input data $\boldmath{X} \in \mathbb{R}^{n\times d}$. weights $\boldmath{w} \in \mathbb{R}^{d \times 1}$ and and targets $\boldmath{y} \in \mathbb{R}^{n \times 1}$. Now, suppose that dropout is being applied to the input units with probability $p$.

1) Rewrite the input data matrix taking into account the probability of each unit to be dropped out (Hint: the probability of each unit to be dropped out is a Bernoulli random variable with probability $p$).

2)What is the cost function of the linear regression with dropout?

3)Show that applying dropout to the linear regression problem aforementioned can be seen as using L2 regularization in the loss function.

# Q18 – Regularization

Show that L2 regularization applied to a linear regression with weights $\boldsymbol{w}$, input data $\boldsymbol{x}$ and targets $\boldsymbol{y}$ with mean squared error loss function corresponds to assuming a Gaussian prior over the weights.

# Q17 – ConvNet Invariances

Question 1: A convolutional neural network (CNN) has the ability to be “insensitive” to some slight spatial variations in the input data, such as translation. In comparison with the regular feed-forward neural networks, the CNN architecture has two components responsible for providing this kind of insensitivity. Explain which are those components and how a CNN can ignore small translations in the input data.

# Q16 – Linear RNN Dynamics

Consider the behavior of a linear RNN:
$h_t = W h_{t-1} + U x_{t} + b$

1.  Write $h_t$ as a function of $h_0$.
2.  Write out $\frac{d h_t}{d h_0}$.
3.  What happens when $t \to \infty$? Under what conditions?

# Q15 – Softmax and Cross Entropy

The softmax function for $m$ classes is given by

$p_i = \frac{e^{x_i}}{\sum_{j=1}^m e^{x_j}} \text{ for } i = 1\ldots m$.

It transforms a vector $(x_i)$ of real values into a probability mass vector for a categorical distribution.  It is often used in conjunction with the cross-entropy loss
$L(x, y) = - \sum_{i=1}^m y_i \log p_i$

1. Find a simplified expression for $p_i$ when $k = 2$.
2. Differentiate $p_i$ with respect to $x_k$.
3. Differentiate $L$ with respect to $x_k$.

# 09 – Demonstration of Implementing Convnets

In this lecture I’ll walk us through training a convnet to do MNIST classification.  If time permits I’ll take requests on demonstrating other methods for trying to improve results.

Code will be posted here beforehand but I’ll try to implement it in class without using any notes.

https://github.com/alexmlamb/convnet_demo_ift6266

I’ll also explain the class project!

https://ift6266h17.wordpress.com/project-description/