Q11 – Pooling in transpose convolution (deconvolution) layers

Contributed by Vasken Dermardiros

  1. What is the roll of pooling in convolutional layers?
  2. What would it result in up-convolution layers?

4 thoughts on “Q11 – Pooling in transpose convolution (deconvolution) layers

  1. (Very short) answer for the Q1 :

    Pooling layers are used in CNNs in order to :
    • Reduce the number of hidden units (less parameters and then a better computation time)
    • Introduce invariance to small local translations

    Concerning Q2, I’m not sure of the right answer


  2. I’m not so sure either for Q2, it’s not really clear what is the desired answer here. However, I can imagine that introducing pooling in a deconvolutional layer could act as a sort of regularizer, smoothing out artifacts that could otherwise be deconvolutioned further. And you would kind of have the same effects as for a convolutional layer – the size after the deconvolution can be constrained, and you get built-in invariance to local translations.


  3. I think I might have found something related to a possible answer in the course textbook.

    In section 20.10.6 (p.695):

    “The primary mechanism for discarding information in a convolutional recognition network is the pooling layer. The generator network seems to need to add information. We cannot put the inverse of a pooling layer into the generator network because most pooling functions are not invertible. A simpler operation is to merely increase the spatial size of the representation.

    An approach that seems to perform acceptably is to use an “un-pooling” as introduced by Dosovitskiy et al.(2015). This layer corresponds to the inverse of the max-pooling operation under certain simplifying conditions. First, the stride of the max-pooling operation is constrained to be equal to the width of the pooling region. Second, the maximum input within each pooling region is assumed to be the input in the upper-left corner. Finally, all non-maximal inputs within each pooling region are assumed to be zero.

    These are very strong and unrealistic assumptions, but they do allow the max-pooling operator to be inverted. The inverse un-pooling operation allocates a tensor of zeros, then copies each value from spatial coordinate i of the input to spatial coordinate i × k of the output. The integer value k defines the sizeof the pooling region. Even though the assumptions motivating the definition of the un-pooling operator are unrealistic, the subsequent layers are able to learn to compensate for its unusual output, so the samples generated by the model as a whole are visually pleasing”

    (I added the paragraphs for readability).

    Here is the paper in question : https://www.robots.ox.ac.uk/~vgg/rg/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf

    What do you think of this solution ?


  4. For Q2, here is what I recall from what Aaron said in class Monday (which was on an alternative interpretation of the question):

    Let’s do a parallel with strides. In a normal convolution, using strides of size k means that we will shift our convolution buy k between each of our product. Ok, now what happens if we use strides of size k in a transposed convolution? Well it turns out that we insert k 0s between each pixels, and then do our (transposed)convolution.
    Cool. Now, what about max pooling? During a normal convolution, we subsample the image, keeping only the maximum value in a specific region. For transpose Convolution now, what would it do? Intuitively, we could say that a UPscaling would be done. What kind of upscaling should we use? I don’t think that there is a way which is clearly better then the other ones. Aaron said that if we remember where each max come from (i.e. the argmax), assuming that we are in some kind of encoder-decoder setting, we could put it back at the same position (and putting 0s everywhere else).
    We could thing of other way of doing that, the simplest being doing exactly what we are doing for the strides. The values could also be put at random in the local region.

    It wasn’t discuss in class, be we could also ask ourselves the same question with zero padding: “What is the corresponding operation to zeropadding in a transposed convolution?”

    If some of you want more information on convolution arithmetic, you can check here:


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s