In this lecture we finish up our discussion of training neural networks and we introduce Convolutional Neural Networks.

Lecture 04 CNNs (slides modified from Hugo Larochelle’s course notes)

**Reference: **(* = you are responsible for all of this material)

- *Chapter 9 of the Deep Learning textbook, Sections 9.10 and 9.11 are optional.
- Andrej Karpathy’s excellent tutorial on CNNs.

Advertisements

Great 4 min video to help visualize the different layers in a convolutional network. It helps understand how a conv net can detect faces, wrinkles, and other high level features.

LikeLiked by 3 people

In the book in page 333 it says: “Convolution usually corresponds to a very sparse matrix …this is because the kernel is usually much smaller than the input image.”

I don’t understand what one has to do with the other. Is it because the kernel is learned? Because from Figure 9.1 this is not immediately obvious to me.

LikeLike

It means that the number of activated outputs is way lesser than the number of input pixels.

For example, Consider the input is an MNIST image (28,28,1). This input matrix has values, typically in all elements. i.e. Very few actual pixels with 0.0 exactly. Now, say we convolved this image with an edge detection filter (4,4,1) (with zero padding to keep the dimensions same) and pushed the output through a ReLU, the resulting output matrix will have values only at the points that actually have edges, say 100 pixels.

Therefore:

[Input_matrix : 784 pixels with values] -> [Edge detector] -> [ReLU] ->

[Output_matrix : 100 pixels with values, 684 pixels with 0.0s]

LikeLike

Maybe it’s a bit late to answer this question but in a more representative way it could be explained by the picture taken from the course Convolutional NNs II – Lecture 06 CNNs at slide 16.

In this picture we convolve a 3*3 kernel on a 4*4 matrix (i.e a vector of dimension 16). This leads to an output of size 2*2 (i.e a vector of dimension 4).

This convolution can be represented as a matrix operation where you unroll the input and output into vectors from left to right, top to bottom. The size of this matrix is then (4*16) and because of the size of the kernel, which is smaller than the input, this matrix will be sparse.

LikeLike

The other answers are very interesting but do not answer directly your question I believe. Yes the kernels are what is learned. Hope this helps!

LikeLike

In case someone has a similar question, it might be helpful to look at the section “Convolution as a matrix operation” on the following page:

http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

I find the image to be very helpful to fully grasp the reason why they would make for fairly sparse matrices, in addition to the other answers.

LikeLike

In the slide 34, it is shown how certain images are chosen, so that they are the ones which, evaluated by the kernels, maximize their output. After these images are chosen, the parameters of the kernels are further refined, optimized to maximize their output respect to those images, so to learn better their features.

I asked in class: Does the process of training of the kernels over particular images, cause loss of generalization ability? in other words, doesn’t this over-fit the model to detect these images?

The small discussion that went over this question provided the following answer: further adapting the kernels to these images will cause a generalization problem, if the images do not representative behaviors. Overfitting to those images is not a thing that is impossible to happen, but carefully choosing the images to be those who are already representative of the patterns captured by the kernels allows this to be something that helps, and not something that hinders the generalization ability.

Note: This comment is only to leave trace in the blog of something that I asked in class. I frequently asked questions, but was very shy to come to the blog and post them.

LikeLike