Abstract: |
Recent years have seen an incredible increase in the use of artificial intelligence and machine learning (AI/ML) methods in essentially all areas of science, technology, engineering, humanities, and education. However, despite this widespread adoption of AI/ML models as analytical tools, relatively very little is known about the theoretical properties of these models. In particular, there is minimal theory justifying their use, despite a large and growing body of empirical evidence of their efficacy.
In this talk, I will attempt to partially bridge this gap between theory and performance for a particular machine learning problem: clustering. Clustering is a classical machine learning task where one seeks to partition a given data into subgroups of similar items called clusters. I will argue that the clustering task can be modeled as a family of combinatorial optimization problems, and illustrate how to use methods from semidefinite programming to design efficient and accurate numerical methods for clustering. In particular, I will propose several models for well-behaved or clusterable data, where we can expect these clustering heuristics to correctly identify the hidden cluster structure. |
|