Q16 – Linear RNN Dynamics

Consider the behavior of a linear RNN:
h_t = W h_{t-1} + U x_{t} + b

1.  Write h_t as a function of h_0.
2.  Write out \frac{d h_t}{d h_0}.
3.  What happens when t \to \infty? Under what conditions?

Advertisements

5 thoughts on “Q16 – Linear RNN Dynamics

  1. 1) ht = W^t h0 + W^(t-1) ( UXt1 + b) + W^(t-2) (UXt2 + b) + … + W ( Uxt +b)
    2) dht/dh0 = W^t
    3) if elements in W are 1 then gradient explosion

    Is this right or am I missing something?

    Like

    • For 3), I think you can refer to sec. 10.7 of the textbook – rouhgly, W can be eigendecomposed into
      W = Q Lambda Q^T, and W^t = Q Lambda^t Q^T. So any feature aligned with an eigenvalue 1, the gradient will explode.

      Like

      • Part of my previous comment got messed up:

        So any feature aligned with an eigenvalue less than 1 will vanish, and if any eigenvalue is greater than 1, the gradient will explode.

        Liked by 3 people

    • 3 is tricky. An eigenvalue greater than one doesn’t imply explosions, e.g. if h0 is orthogonal to all the eigenvectors with eigenvalue greater than one.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s