In this lecture, we will have a rather detailed discussion of optimization methods and their interpretation.
Slides:
Reference: (* = you are responsible for this material)
- *Chapter 8 of the Deep Learning textbook.
In this lecture, we will have a rather detailed discussion of optimization methods and their interpretation.
Slides:
Reference: (* = you are responsible for this material)
Hello. The link for the slides is broken at the moment 🙂
Cheers.
LikeLike
Has anyone applied evolutionary algorithms (e.g. Genetic Algorithms, Differential Evolution, PSO, CMA-ES) to train DNNs? I can’t find many works in that direction. Is there any big issue in doing so? High number of parameters?
LikeLike
Although I don’t know much about this topic myself, I believe what you are looking for is the field of neuroevolution. https://en.wikipedia.org/wiki/Neuroevolution
The issue with optimization using evolutionary algorithms is that it is much slower than backpropagation for large networks. For example, if a neural network has 100 000 weights, then the gradient effectively gives 100 000 hints (one per weight) to help adjust them. In contrast, a genetic algorithm only uses the final ‘fitness’ score of the whole network to assess its performance and tune the weights.
I would note that genetic algorithms may be useful to optimize discrete quantities (like the number of hidden units) which backpropagation cannot be used for.
LikeLike
Thanks for the answer.
I’ve just saw this paper on the subject and would like to share:
https://arxiv.org/abs/1703.01041
LikeLike
Respect ADAM, I asked in class: why is there a bias in the estimators, and how it is corrected by the following expressions? It was not evident for me at the moment, just by looking at the pseudo-code.
Aaron pointed out that, taking the first iteration as an example, we have that
s = 0,
so the first “s” is estimated by:
s = (1-rho_1)*h,
which is a value that is lower than h. In this iteration, dividing by (1-rho_1^1), corrects the estimate, so that s = h. The same applies for the second moment estimates. It can be proven for further iterations as well, of course.
Note: This comment is only to leave trace in the blog of something that I asked in class. I frequently asked questions, but was very shy to come to the blog and post them.
LikeLike