During training, a neural network's loss decreases on the training set for 50 epochs, but after epoch 20, the validation loss starts increasing while training loss continues to fall. A junior engineer proposes increasing the number of layers to improve performance. What is the actual problem, and why would the proposed solution make it worse?
Training loss decreasing while validation loss increases is the classic signature of overfitting: the model is learning patterns specific to the training data that do not generalize. Adding more layers increases the model's capacity to memorize, making overfitting worse. The correct interventions are regularization (dropout, weight decay), early stopping (stop at epoch 20), more training data, or reducing model complexity. Vanishing gradients would cause training loss to plateau, not continue decreasing. An oscillating learning rate would show erratic loss, not smooth divergence.