08 / 10 – Sequential Models: Recurrent and Recursive Nets

In this lecture we introduce Recurrent Neural Networks.

Lecture 08 RNNs (slides from Hugo Larochelle)

Reference: (* = you are responsible for this material)


10 thoughts on “08 / 10 – Sequential Models: Recurrent and Recursive Nets

  1. In DeepLearningBook (10.2.1 – 10.2.2), two different recurrent nets are compared:
    1 – With hidden-to-hidden connections
    2 – Only with output-to-hidden connections (no hidden-to-hidden connections)

    It says that computing gradient in the second case is easier to compute, because “there is no need to compute the output for the previous time step first, because the training set provides the ideal value of that output”.
    I understand that if computation can be parallelized, then it needs less computation time. But I don’t understand what big difference these two architectures have (so that computing gradient become independent from the previous states).


    • Refer figure 10.4 in the text. Lets say we are considering the state at time-step t+1. The hidden layer h(t+1) has only two inputs x(t+1) and o(t). We already have the ideal value of o(t) from the training set. If we provide the true output o(t)_true as the input to the state at t+1 instead of the predicted o(t), the network will become decoupled and the gradients can be computed in isolation


    • In addition to potential computation gains, I think the fact that having a ‘ground truth’ such as the training data reduces the bias in the gradients that would be used in your parameter’s update.


  2. What is the difference between GRU and LSTM?

    1.A GRU has two gates (reset gate r, and an update gate z), an LSTM has three gates, so we have more parameter in LSTM. This unit that is missing from the GRU is the controlled exposure of the memory content( controlled by the output gate in LSTM) but GRU exposes its full content without any control.

    2. The LSTM unit computes the new memory content without any separate control of the amount of information flowing from the previous time step. Rather, the LSTM unit controls the amount of the new memory content being added to the memory cell independently from the forget gate. On the other hand, the
    GRU controls the information flow from the previous activation when computing the new, candidate
    activation( The activation in GRU is a linear interpolation between the previous activation and the candidate activation), but does not independently control the amount of the candidate activation being added

    For detail description you can explore this Research Paper – https://arxiv.org/pdf/1412.3555v1.pdf The paper explains all this brilliantly.


  3. I was wondering if someone had experienced with teacher forcing to answer my question. Since at test time we cannot use the true output as an input for our sequence being predicted, I was wondering if using the true data during training could actually hurt the model’s ability to generalize?
    Isn’t the model expecting to receive a ground truth and then performs worst by not having it compared to being trained only using its own predictions?


  4. When using the RNN for prediction, the ground-truth sequence is not available conditioning and
    we sample from the joint distribution over the sequence by sampling each y_t
    from its conditional distribution given the previously generated samples. Unfortunately, this procedure can result in problems in generation as small prediction error compound in the conditioning context. This can
    lead to poor prediction performance as the RNN’s conditioning context (the sequence of previously
    generated samples) diverge from sequences seen during training.

    There are couple of papers which address this issue.

    Scheduled Sampling, Professor Forcing – A New Algorithm for Training Recurrent Networks


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s