[Submitted on 19 Sep 2022]
Download PDF
Overview: We limited the excessive risk of interpolating deep linear networks trained using gradient flow. The setup used previously to establish the risk bounds for the minimum $\ell_2$ norm interpolator causes a randomly initialized deep linear network to adhere strictly to the known bounds for the minimum $\ell_2$ norm interpolation. Indicates an approximation or even a match. Our analysis also reveals that interpolating the deep linear model has exactly the same conditional variance as the minimum $\ell_2$ -norm solution. Depth does not improve the ability of the algorithm to “hide the noise”, since noise affects excess risk only through the conditional variance. Our simulations confirm that the bounding sides reflect the typical behavior of simple data distributions. We also found similar phenomena in simulations using ReLU networks, but the situation is more nuanced.
Submission history
From: Niladri Chatterji [view email]
[v1]
Mon, Sep 19, 2022 19:23:04 UTC (3,105 KB)