In this lecture, Dzmitry (Dima) Bahdanau will discuss attention and memory in neural networks.
Slides:
- Attention Models in Deep Learning by D. Bahdanau.
Reference: (* = you are responsible for this material)
- *Sections 12.4.5 of the Deep Learning textbook.
- *Attention and Augmented Recurrent Neural Networks, a blog post by Chris Olah and Shan Carter, Sept. 2016.
- *Neural Machine Translation by Jointly Learning to Align and Translate, D. Bahdanau, K. Cho, Y. Bengio, ICLR 2015
- *Section 5 of Generating Sequence with Recurrent Neural Networks, A. Graves, ArXiV
- Connectionist Temporal Classification: Labellling Unsegmented Sequence Data with Recurrent Neural Networks, A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber, ICML 2006
- Part II: Visual attention models
- Recurrent Models of Visual Attention, V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, NIPS 2014
- DRAW: a Recurrent Neural Network for Image Generation, K. Gregor, I. Danihelka, A. Graves, DJ Rezende, D. Wierstra, ICML 2015
- Part III [if time permits]
- Memory Networks, J. Weston, S. Chopra, A. Bordes, ICLR 2015
- Neural Turing Machines, A. Graves, G. Wayne, I. Danihelka, ArXiV
Advertisements
If some of you are interested, there is this paper, “Frustratingly Short Attention Spans in Neural Language Modeling” (https://arxiv.org/pdf/1702.04521.pdf). Briefly, in the context of language modeling, instead of using the whole state of the model for the attention look-up AND the output, they split the state in a key-value tuple. The key is only used for the attention, and the value is only used for the output.
LikeLike