Computer Vision for Developers: How to Solve Vanishing and Exploding Gradients in RNNs for Text Summarization?

Discover expert solutions to gradient issues in RNNs for text summarization. Learn why linear self-connections with weights near one enable multi-time-scale models and stabilize training.

Table of Contents

Question
Answer
Explanation
Why Option D is Correct
Why Other Options Fail

Question

You are using a recurrent neural network (RNN) for a text summarization task. During this process, you observe that the RNN exhibits non-linear behavior, leading to gradient issues of either vanishing or, in rare cases, exploding, depending on the magnitude of the eigenvalues. To address this issue, you must design a model that operates at multiple time scales. What steps will you take?

A. Introduce leaky units with non-linear self-connections, where the weight is close to zero.
B. Actively remove all variable connections between past and present variables.
C. Actively replace all existing network connections with shorter connections.
D. Introduce hidden units with linear self-connections, where the weight is close to one.

Answer

D. Introduce hidden units with linear self-connections, where the weight is close to one.

Explanation

Vanishing and exploding gradients in recurrent neural networks (RNNs) arise due to repeated multiplication of gradients during backpropagation through time. This leads to unstable training, especially in tasks like text summarization, where long-term dependencies are critical. To address this, introducing hidden units with linear self-connections (weight ≈ 1) ensures stable gradient flow over extended sequences.

Why Option D is Correct

Linear Self-Connections Mitigate Gradient Issues

Weights close to one preserve gradient magnitudes over time, enabling long-term memory retention.

Multi-Time-Scale Dynamics

Units with linear self-connections evolve slowly (long time scales), while non-linear units adapt quickly (short time scales). This mimics architectures like LSTMs, where a "cell state" uses linear paths to bypass gradient decay.

Empirical Support

LSTMs and GRUs use gated linear connections to solve vanishing gradients, validating the principle of preserving gradients via linearity.

Batch normalization and gradient clipping (common fixes) address symptoms, but linear self-connections target the root cause.

Why Other Options Fail

A (Leaky Units with Non-Linear Connections): Non-linearities still introduce saturation, limiting gradient flow.

B (Remove Past-Present Connections): Destroys the RNN’s ability to model sequences.

C (Shorter Connections): Reduces network depth but fails to address time-scale separation.

Answer: D (Introduce hidden units with linear self-connections, weight ≈ 1) directly enables stable multi-time-scale learning, resolving gradient issues while preserving sequential modeling.

Final Tip: For robust text summarization, combine linear self-connections with techniques like gradient clipping or attention mechanisms to enhance performance.

Computer Vision for Developers skill assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Computer Vision for Developers exam and earn Computer Vision for Developers certification.