Contributed by Vasken Dermardiros

- What is the roll of pooling in convolutional layers?
- What would it result in up-convolution layers?

Advertisements

Skip to content
# IFT6266 – H2017 Deep Learning

## A Graduate Course Offered at Université de Montréal

# Q11 – Pooling in transpose convolution (deconvolution) layers

##
4 thoughts on “Q11 – Pooling in transpose convolution (deconvolution) layers”

### Leave a Reply

Contributed by Vasken Dermardiros

- What is the roll of pooling in convolutional layers?
- What would it result in up-convolution layers?

Advertisements

%d bloggers like this:

(Very short) answer for the Q1 :

Pooling layers are used in CNNs in order to :

• Reduce the number of hidden units (less parameters and then a better computation time)

• Introduce invariance to small local translations

Concerning Q2, I’m not sure of the right answer

LikeLike

I’m not so sure either for Q2, it’s not really clear what is the desired answer here. However, I can imagine that introducing pooling in a deconvolutional layer could act as a sort of regularizer, smoothing out artifacts that could otherwise be deconvolutioned further. And you would kind of have the same effects as for a convolutional layer – the size after the deconvolution can be constrained, and you get built-in invariance to local translations.

LikeLike

I think I might have found something related to a possible answer in the course textbook.

In section 20.10.6 (p.695):

“The primary mechanism for discarding information in a convolutional recognition network is the pooling layer. The generator network seems to need to add information. We cannot put the inverse of a pooling layer into the generator network because most pooling functions are not invertible. A simpler operation is to merely increase the spatial size of the representation.

An approach that seems to perform acceptably is to use an “un-pooling” as introduced by Dosovitskiy et al.(2015). This layer corresponds to the inverse of the max-pooling operation under certain simplifying conditions. First, the stride of the max-pooling operation is constrained to be equal to the width of the pooling region. Second, the maximum input within each pooling region is assumed to be the input in the upper-left corner. Finally, all non-maximal inputs within each pooling region are assumed to be zero.

These are very strong and unrealistic assumptions, but they do allow the max-pooling operator to be inverted. The inverse un-pooling operation allocates a tensor of zeros, then copies each value from spatial coordinate i of the input to spatial coordinate i × k of the output. The integer value k deﬁnes the sizeof the pooling region. Even though the assumptions motivating the deﬁnition of the un-pooling operator are unrealistic, the subsequent layers are able to learn to compensate for its unusual output, so the samples generated by the model as a whole are visually pleasing”

(I added the paragraphs for readability).

Here is the paper in question : https://www.robots.ox.ac.uk/~vgg/rg/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf

What do you think of this solution ?

LikeLike

For Q2, here is what I recall from what Aaron said in class Monday (which was on an alternative interpretation of the question):

Let’s do a parallel with strides. In a normal convolution, using strides of size k means that we will shift our convolution buy k between each of our product. Ok, now what happens if we use strides of size k in a transposed convolution? Well it turns out that we insert k 0s between each pixels, and then do our (transposed)convolution.

Cool. Now, what about max pooling? During a normal convolution, we subsample the image, keeping only the maximum value in a specific region. For transpose Convolution now, what would it do? Intuitively, we could say that a UPscaling would be done. What kind of upscaling should we use? I don’t think that there is a way which is clearly better then the other ones. Aaron said that if we remember where each max come from (i.e. the argmax), assuming that we are in some kind of encoder-decoder setting, we could put it back at the same position (and putting 0s everywhere else).

We could thing of other way of doing that, the simplest being doing exactly what we are doing for the strides. The values could also be put at random in the local region.

It wasn’t discuss in class, be we could also ask ourselves the same question with zero padding: “What is the corresponding operation to zeropadding in a transposed convolution?”

If some of you want more information on convolution arithmetic, you can check here:

http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

LikeLike