14/15 – Optimization

In this lecture, we will have a rather detailed discussion of optimization methods and their interpretation.

Slides:

Reference: (* = you are responsible for this material)

5 thoughts on “14/15 – Optimization

  1. Has anyone applied evolutionary algorithms (e.g. Genetic Algorithms, Differential Evolution, PSO, CMA-ES) to train DNNs? I can’t find many works in that direction. Is there any big issue in doing so? High number of parameters?

    Like

    • Although I don’t know much about this topic myself, I believe what you are looking for is the field of neuroevolution. https://en.wikipedia.org/wiki/Neuroevolution

      The issue with optimization using evolutionary algorithms is that it is much slower than backpropagation for large networks. For example, if a neural network has 100 000 weights, then the gradient effectively gives 100 000 hints (one per weight) to help adjust them. In contrast, a genetic algorithm only uses the final ‘fitness’ score of the whole network to assess its performance and tune the weights.

      I would note that genetic algorithms may be useful to optimize discrete quantities (like the number of hidden units) which backpropagation cannot be used for.

      Like

  2. Respect ADAM, I asked in class: why is there a bias in the estimators, and how it is corrected by the following expressions? It was not evident for me at the moment, just by looking at the pseudo-code.

    Aaron pointed out that, taking the first iteration as an example, we have that

    s = 0,

    so the first “s” is estimated by:

    s = (1-rho_1)*h,

    which is a value that is lower than h. In this iteration, dividing by (1-rho_1^1), corrects the estimate, so that s = h. The same applies for the second moment estimates. It can be proven for further iterations as well, of course.

    Note: This comment is only to leave trace in the blog of something that I asked in class. I frequently asked questions, but was very shy to come to the blog and post them.

    Like

Leave a comment