| Abstract: |
| The curse of dimensionality renders statistical and machine learning in high dimensions intractable without additional assumptions on the underlying data. We consider geometric models for data that allow for mathematical performance guarantees and efficient algorithms that break the curse. Specifically, this talk considers a family of data-driven metrics that balance between density and geometry in the underlying data, known as Fermat distances. We consider discrete graph operators based on these metrics and prove performance guarantees for clustering with them in the spectral graph paradigm. Fast algorithms based on Euclidean nearest-neighbor graphs are proposed and connections with continuum operators on manifolds are developed. |
|