Special Session 122: Understanding the Learning of Deep Networks: Expressivity, Optimization, and Generalization

Optimization and Generalization of Gradient Descent for Shallow ReLU Networks

Yunwen Lei
The University of Hong Kong
Peoples Rep of China
Co-Author(s):    Puyu Wang, Yiming Ying, Ding-Xuan Zhou
Abstract:
Understanding the generalization and optimization of neural networks is a longstanding problem in modern learning theory. The prior analysis often leads to risk bounds of order $1/\sqrt{n}$ for ReLU networks, where $n$ is the sample size. In this talk, we present a general optimization and generalization analysis for gradient descent applied to shallow ReLU networks. We develop convergence rates of order $1/T$ for gradient descent with $T$ iterations, and show that the gradient descent iterates fall inside local balls around either an initialization point or a reference point. We also develop improved Rademacher complexity estimates by using the activation pattern of the ReLU function in these local balls.