Abstract: |
Understanding the generalization and optimization of neural networks is a longstanding problem in modern learning theory. The prior analysis often leads to risk bounds of order $1/\sqrt{n}$ for ReLU networks, where $n$ is the sample size. In this talk, we present a general optimization and generalization analysis for gradient descent applied to shallow ReLU networks. We develop convergence rates of order $1/T$ for gradient descent with $T$ iterations, and show that the gradient descent iterates fall inside local balls around either an initialization point or a reference point. We also develop improved Rademacher complexity estimates by using the activation pattern of the ReLU function in these local balls. |
|