Student Forum

You can submit your questions regarding Theano, Pylearn2, the class project or anything related as comments to this page.

139 thoughts on “Student Forum”

Massimiliano C says:

April 21, 2017 at 11:10 am

Hi !

When I try to run my first GAN on hades I get a MemoryError. It was working perfectly with my previous models, the GAN is working on my local machine, and I’ve changed nothing to the .pbs script.

Here is the error :

Error allocating 37748736 bytes of device memory (out of memory). Driver report 14974976 bytes free and 1610285056 bytes total
Using gpu device 1: GeForce GTX 580 (CNMeM is disabled, cuDNN not available)
Traceback (most recent call last):
File “/home2/ift6ed13/p3.5/lib/python3.5/site-packages/theano/compile/function_module.py”, line 884, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM failed to allocate working memory of 2304 x 4096

Note that I am not even loading the dataset into RAM (via shared variables)… So I don’t understand how I could have a memory problem, since the previous models were working.

Here is the script I use :

#!/bin/bash
#PBS -l walltime=120:00:00
#PBS -l nodes=1:ppn=1 -l mem=12gb
#PBS -o /home2/ift6ed13/results/log.out
#PBS -r n

module add python/3.5.1
module add openblas/0.2.18
module add CUDA/7.5

source /home2/ift6ed13/p3.5/bin/activate

cd /home2/ift6ed13/code/

python Launch.py

I believe 12 Gb is the maximum memory available for one gpu. Setting ppn=2 doesn’t change anything. Any ideas ?

Thanks !

LikeLiked by 1 person

Reply
- Patrick Mesana says:
  
  April 21, 2017 at 11:19 am
  
  12GB on one GeForce GTX 580 ? Last time I checked it was 1.5GB
  
  LikeLike
  
  Reply
- Philip Paquette says:
  
  April 21, 2017 at 11:30 am
  
  This is caused by the GPU memory (Hades has 1.5GB/GPU – 1610285056 bytes), rather than RAM.
  
  The only way to solve this issue is 1) reduce your batch size, 2) reduce your model complexity, or 3) run on a GPU with more memory (e.g. AWS or Google Cloud have 12 GB/GPU of GPU memory).
  
  You can probably use Google $300 trial credit to run free GPU instances if you don’t want to pay.
  
  LikeLiked by 1 person
  
  Reply
  - Massimiliano C says:
    
    April 21, 2017 at 11:42 am
    
    Thanks for your answer !
    
    It’s very strange because the model doesn’t seem that big to me, and it crashed even with a batchsize of 25. But it makes sense if there’s only 1.5Gb per GPU …
    
    LikeLike
  - Massimiliano C says:
    
    April 21, 2017 at 12:16 pm
    
    Ok it’s fine now, I reduced significantly the number of parameters and it runs, with a minibatch size of 50..!
    
    LikeLike
Vitaliy says:

April 21, 2017 at 7:29 pm

simple calculation gives (for 82000 images) using float 32 (4 bytes) that you need 8 GB memory if you want to load everything in the memory. And this is only for original images. If you want to use cropped images and targets this requires 20 GB in total. With testing images (also original, cropped and targets) you probably exceed 32 GB. For sure you’ll get a memory error except you have 64 GB memory. So it depends how much data you download to RAM in one time. One more time there is no way to download all data in the memory in one shot. However our TAs did not advertise about that at the very beginning

LikeLike

Reply
Vitaliy says:

April 21, 2017 at 7:46 pm

with parameters it is quite unlikely that you can exceed 1.5Gb except the net is huge, it is always possible to draw the summary how many parameters do you use in total in the net, my net VAE uses 33M for parameters, so with float 32 it gives approx 130M of RAM which is nothing

LikeLike

Reply
mounamokaddem says:

April 23, 2017 at 5:52 pm

Hi,

When trying to run my first GAN model on GPU, I got this error message:

RuntimeError: (‘The following error happened while compiling the node’, GpuDnnConvGradI{algo=’none’, inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode=’valid’, subsample=(1, 1), conv_mode=’conv’, precision=’float32′}.0, Constant{1.0}, Constant{0.0}), ‘\n’, ‘could not create cuDNN handle: CUDNN_STATUS_INTERNAL_ERROR’, “[GpuDnnConvGradI{algo=’none’, inplace=True}(, , , , Constant{1.0}, Constant{0.0})]”)

Any idea about that ?

Thanks,

LikeLike

Reply
- mounamokaddem says:
  
  April 23, 2017 at 6:06 pm
  
  Sorry the whole error message is:
  
  Traceback (most recent call last):
  File “/u/mokaddem/IFT6266/DCGAN.py”, line 353, in
  train(1)
  File “/u/mokaddem/IFT6266/DCGAN.py”, line 309, in train
  allow_input_downcast=True)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/compile/function.py”, line 326, in function
  output_keys=output_keys)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/compile/pfunc.py”, line 486, in pfunc
  output_keys=output_keys)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/compile/function_module.py”, line 1795, in orig_function
  defaults)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/compile/function_module.py”, line 1661, in create
  input_storage=input_storage_lists, storage_map=storage_map)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/link.py”, line 699, in make_thunk
  storage_map=storage_map)[:3]
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/vm.py”, line 1047, in make_all
  impl=impl))
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/op.py”, line 935, in make_thunk
  no_recycling)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/op.py”, line 839, in make_c_thunk
  output_storage=node_output_storage)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/cc.py”, line 1190, in make_thunk
  keep_lock=keep_lock)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/cc.py”, line 1131, in __compile__
  keep_lock=keep_lock)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/cc.py”, line 1586, in cthunk_factory
  key=key, lnk=self, keep_lock=keep_lock)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/cmodule.py”, line 1155, in module_from_key
  module = lnk.compile_cmodule(location)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/cc.py”, line 1489, in compile_cmodule
  preargs=preargs)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/sandbox/cuda/nvcc_compiler.py”, line 417, in compile_str
  return dlimport(lib_filename)
  File “/u/mokaddem/.local/lib/python2.7/site-packages/theano/gof/cmodule.py”, line 302, in dlimport
  rval = __import__(module_name, {}, {}, [module_name])
  RuntimeError: (‘The following error happened while compiling the node’, GpuDnnConvGradI{algo=’none’, inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode=’valid’, subsample=(1, 1), conv_mode=’conv’, precision=’float32′}.0, Constant{1.0}, Constant{0.0}), ‘\n’, ‘could not create cuDNN handle: CUDNN_STATUS_INTERNAL_ERROR’, “[GpuDnnConvGradI{algo=’none’, inplace=True}(, , , , Constant{1.0}, Constant{0.0})]”)
  
  I think it is some problem with allow_input_downcast=True but I can not see why.
  
  Thanks,
  
  LikeLike
  
  Reply
vetal79 says:

April 25, 2017 at 5:22 pm

There will be a conference at CRIM on GPU application and there will be a guy from calculquebec. All talks will be about modern applications of GPUs for different problems. This is free event. And as I know from past experience vine and snacks (fruits, cheese) will also be offered after, during networking. This is the link for this event
https://www.eventbrite.ca/e/billets-crim-atelier-r-drd-workshop-33903337796
The number of places is limited

LikeLike

Reply
Massimiliano C says:

April 26, 2017 at 8:31 pm

Hi everyone,

I am having some very strange problems with hades, my job doesn’t run on the GPU, it simply doesn’t use it. It works on my local machine, and the code is quite light (runs as fast as my previous, non-GAN, code).

I contacted the support of calcul quebec and they can’t tell why.

Apparently the job ‘jumps’ from the GPU back to the CPU. (If you see that your job has a time use of ~5minutes in several hours, you may have the same problem as me)

I am still not copying the dataset into RAM, since the code needs almost all of it.

Also, apart from the job that runs on CPU, if I launch a second one it throws me a Segmentation Fault. (all the codes work well on my machine).

Anyone is having this same problem ? Any idea how I could fix it ?

Also, if anyone have logs on some other cluster that I could use to run these 2 last jobs, it would be wonderful ! (Am I hoping too much? haha)

Thanks and good luck.

LikeLike

Reply
ilyaivensky says:

April 27, 2017 at 3:25 am

How can I link mu link my project blog and github to the course website? I am a newbie in the blogging. My blog in wordpress.com (under construction)

LikeLike

Reply
- Gabriele Prato says:
  
  April 27, 2017 at 8:55 am
  
  You need to send your blog and github link (url) to Aaron and he will add it to the course website.
  
  LikeLike
  
  Reply
- aaroncourville says:
  
  April 28, 2017 at 1:52 pm
  
  If you’ve sent me an email with the link, I’ll link it.
  
  — Aaron
  
  LikeLike
  
  Reply
  - Charles Ashby says:
    
    April 28, 2017 at 2:00 pm
    
    Hello Aaron, I sent two emails during the course of the past month, but my blog is not on the page, I sent one to your gmail and another to your udem email, should I send it again?
    
    LikeLike
  - Stéphanie Larocque says:
    
    April 28, 2017 at 6:51 pm
    
    I also sent my blog url/repo a couple of weeks ago, but it’s not on the website yet.
    
    LikeLike
dimitrigallos says:

April 27, 2017 at 2:38 pm

I’m just curious : have the grades for the final been posted yet? I don’t think I have access to studium (says I’m not taking any course), so I have no way of checking.

LikeLike

Reply
- aaroncourville says:
  
  April 28, 2017 at 1:53 pm
  
  No grades yet. This will be posted towards the end of next week.
  
  LikeLike
  
  Reply
  - julianzaidi says:
    
    May 15, 2017 at 7:21 pm
    
    Hi !
    
    Any news concerning the grades ?
    
    LikeLiked by 2 people
pabrousseau says:

April 29, 2017 at 1:07 pm

Hi everyone,

As this is the last day, I am searching for a good image quality metric to quantitatively measure image quality in my results.

I am currently using the Universal image quality index. It is generally used in computer vision to compare a noised imaged to its ground truth version. Unfortunately, this metric makes sense only when you have the ground truth to compare to the generated image and both are of the same image. In my case, the generator sometimes generates an image that does not look similar to the truth.

Thanks in advance,

Here is the paper for Universal image quality index for those interested.
Prashanth, H. S., Shashidhara, H. L., & KN, B. M. (2009, December). Image scaling comparison using universal image quality index. In Advances in Computing, Control, & Telecommunication Technologies, 2009. ACT’09. International Conference on (pp. 859-863). IEEE.

LikeLike

Reply
vetal79 says:

April 29, 2017 at 2:44 pm

there is guy from Texas Alan Bovic which did and does a lot of work in image and video quality evaluation,
you can see here some works and implementations. In their webpage you can find the Matlab code for a large number of indexes. They are both reference and no-reference based:
http://live.ece.utexas.edu/research/quality/
There is a VIF index but I belief that this is reference based. Actually they published a lot of papers for both images and videos. Some of indexes are based on wavelet decomposition

LikeLike

Reply
Etienne Dumesnil says:

May 3, 2017 at 10:57 am

Hi everyone,

In this course (and in the deep learning community in general from what I can gather), we focus on making a neural net as efficient as possible in a given task. However, there are also interesting perspectives in integrating the deep learning architecture with our current knowledge of neuroscience. Indeed, deep learning gives us a nice tool to build models (and thus potentially better understand) how the brains work! Here is an interesting paper on the subject: http://journal.frontiersin.org/article/10.3389/fncom.2016.00094/full.

I, for one, have a background in cognitive psychology and took this class to increase my brain modeling toolbox! While it is obvious how deep learning could be used to model some areas of the brain such as the ventral stream, which is implicated in object recognition and includes the primary cortex (V1), extrastriate areas V2 and V4 and the inferior temporal gyrus (for example, see https://arxiv.org/pdf/1702.07800.pdf), it could probably be used for a lot more! However, it might then be necessary to move our interest from making the networks “deeper” and more powerful, to thinking more in terms of cognitive architectures. Recent interest in attentional processes within the deep learning community may show that the two fields are meant to meet more closely in the near future!

Moreover, an approach that has been taking a lot of place recently in cognitive psychology is the “embodied approach”, where categories are considered to be distributed in a sensorimotor representational space: categories are described in terms of sensorimotor (including action-effect) contingencies, instead of static shapes. If anyone know of some papers in the deep learning documentation that implements this type of dynamic representation, please let me know!! 🙂

LikeLike

Reply
- vderm says:
  
  May 6, 2017 at 4:30 pm
  
  Hey Etienne,
  
  Very interesting article on the link of neuroscience and deep learning; and your post about the CNN+FFT paper too. Thanks for sharing!
  
  Vasken
  
  LikeLike
  
  Reply
Etienne Dumesnil says:

May 3, 2017 at 11:25 am

Hi again,

Here is a very interesting article presenting CNNs with filter representations distributed directly in the frequency domain: https://arxiv.org/pdf/1506.03767.pdf.

One advantage of this type of representation is that the filters tend to be sparser in their spectral representations then they would be in a spatial domain representation. This provides the
optimizer with more meaningful axis-aligned directions that can be taken advantage of with standard
element-wise preconditioning.

Moreover, the paper presents the concept of “spectral pooling”. It performs dimensionality reduction by projecting onto the frequency basis set and then truncating the representation. This type of pooling presents the advantage of preserving considerably more information than “max pooling” for the same number of parameters.

Not only it is very interesting to think in terms of the frequency domain for image analysis, but intuitively, it seems to me that it may allow us to eventually analyze sounds with classic CNNs instead of temporal networks such as RNNs!

LikeLike

Reply
Yannick Roy says:

May 26, 2017 at 8:08 am

Hi, I see that we’ve received our final mark for the class, will we have access to the separated grades (exam, questions, project)? Thanks!

LikeLike

Reply
- Théo Rubenach says:
  
  May 31, 2017 at 6:17 am
  
  Hi, how did you receive the final mark ?
  I did not receive anything, but it may be because I am not a student of University of Montreal…
  Thanks !
  
  LikeLike
  
  Reply
  - Yannick Roy says:
    
    June 1, 2017 at 7:53 am
    
    Yes, I saw the final mark for the class in my UdeM student profile: “Synchro – Centre Étudiant”.
    
    LikeLike
aaroncourville says:

May 26, 2017 at 2:33 pm

Yes … eventually. Working on a script.

LikeLiked by 3 people

Reply