| Nicolas Garcia Trillos |
University of Wisconsin Madison USA |
| Co-Author(s): Sixu Li (UW-Madison), Jan Peszek (Warsaw), Trevor Teolis (Rice), Konstantin Riedl (Oxford), Jake Maranzatto (Maryland), Semih Akkoc (Maryland), and Sennur Ulukus (Maryland) |
|
| Abstract: |
| In this talk, I will discuss a collective dynamics perspective on transformers, the architecture at the heart of modern large language models. In particular, we will discuss how dimensionality reduction techniques akin to those used in the study of the Kuramoto model can be employed to explore the rich structure that the evolution of the distributions of tokens (particles) can have when selecting different values for the key, query, and value matrices parameterizing a transformer model consisting of compositions of multiple self-attention layers. This perspective will allow us to explore token dynamics beyond the gradient flow setting obtained by very specific choices of model parameters and to uncover parameter choices inducing cyclical behavior, consensus formation without stability, and Hamiltonian dynamics. While our theoretical discussion will focus exclusively on 2-dimensional token embeddings, I will also discuss numerical experiments that suggest that our theoretical findings can be extrapolated to general multi-dimensional settings. |
|