Untitled Document

Special Session 78: Special Session on Mathematics of Data Science and Applications

The Theory of Parameter Condensation in Neural Networks

Tao Luo

Shanghai Jiao Tong University
Peoples Rep of China

Co-Author(s):

Abstract:

In this talk, we will first introduce the phenomenon of parameter condensation in neural networks, which refers to the tendency of certain parameters to converge towards the same values during training. Then, for certain types of networks, we prove that condensation occurs in the early stages of training. We further analyze which hyperparameters and training strategies influence parameter condensation. In some cases, we even provide a phase diagram that delineates whether parameter condensation occurs. We will also briefly discuss the relationship between parameter condensation and generalization ability. Finally, towards the end of the training, we study the set of global minima and present a detailed analysis of its geometric structure and convergence properties.

Go Back