Q19 – Linear Regression

Consider a linear regression problem with input data \boldmath{X} \in \mathbb{R}^{n\times d}. weights \boldmath{w} \in \mathbb{R}^{d \times 1} and and targets \boldmath{y} \in \mathbb{R}^{n \times 1}. Now, suppose that dropout is being applied to the input units with probability p.

1) Rewrite the input data matrix taking into account the probability of each unit to be dropped out (Hint: the probability of each unit to be dropped out is a Bernoulli random variable with probability p).

2)What is the cost function of the linear regression with dropout?

3)Show that applying dropout to the linear regression problem aforementioned can be seen as using L2 regularization in the loss function.

Q17 – ConvNet Invariances

Question 1: A convolutional neural network (CNN) has the ability to be “insensitive” to some slight spatial variations in the input data, such as translation. In comparison with the regular feed-forward neural networks, the CNN architecture has two components responsible for providing this kind of insensitivity. Explain which are those components and how a CNN can ignore small translations in the input data.

Q13 – Activation Functions II

Contributed by Pulkit Khandelwal.

Consider a neural network as shown in the Figure below. The network has linear activation functions. Let the various weights be defined as shown in the figure and also the output of each unit is multiplied by some constant k.

Answer the following questions:

  1. Re-design the neural network to compute the same function without using any hidden units. Express the new weights in terms of the old weights. Draw the obtained perceptron.
  2. Can the space of functions that is represented by the above artificial neural network also be represented by linear regression?
  3. Is it always possible to express a neural network made up of only linear units without a hidden layer? Give a brief justification.
  4. Let the hidden units use sigmoid activation functions and let the output unit use a threshold activation function. Find weights which cause this network to compute the XOR of X_{1} and X_{2} for binary-valued X_{1} and X_{2}. Assume that there are no bias terms.

Q12 – Function Representation and Network Capacity

Contributed by Pulkit Khandelwal.

Let us say that we are given two types of activation functions: linear and a hard threshold function as stated below:

  • Linear:  y = w_{0} + \sum_{i}w_{i}x_{i}
  • Hard Threshold:  y=\left\{  \begin{array}{@{}ll@{}}  1, & \text{if}\ w_{0} + \sum_{i}w_{i}x_{i} \geq 0 \\  0, & \text{otherwise}  \end{array}\right.

Which of the following can be exactly represented by a neural network with one hidden layer? You can use linear and/or threshold activation functions. Justify your answer with a brief explanation.

  1. polynomials of degree 2
  2. polynomials of degree 1
  3. hinge loss
  4. piecewise constant functions