Abstract: |
The performance of machine learning (ML) models often depends on the representation of data, which motivates a resurgence of contrastive representation learning (CRL) to learn a representation function. Recently, CRL has shown remarkable empirical performance and it can even surpass the performance of supervised learning models in various domains such as computer vision and natural language processing.
In this talk, I present our recent progress in establishing the learning theory foundation for CRL. In particular, we address the following two theoretical questions: 1) how would the generalization behavior of downstream ML models benefit from the representation function built from positive and negative pairs? 2) Especially, how would the number of negative examples affect its learning performance?
Specifically, we can show that generalization bounds for contrastive learning do not depend on the number k of negative examples, up to logarithmic terms. Our analysis uses structural results on empirical covering numbers and Rademacher complexities to exploit the Lipschitz continuity of loss functions. For self-bounding Lipschitz loss functions, we further improve our results by developing optimistic bounds which imply fast rates in a low noise condition. We apply our results to learning with both linear representation and nonlinear representation by deep neural networks, for both of which we derive explicit Rademacher complexity bounds. |
|